yuanli2333 / Teacher-free-Knowledge-Distillation

Knowledge Distillation: CVPR2020 Oral, Revisiting Knowledge Distillation via Label Smoothing Regularization
MIT License
580 stars 67 forks source link

Have you ever try on deeper network? #17

Closed JiyueWang closed 4 years ago

JiyueWang commented 4 years ago

In the paper, the reported results are shallow in general, resnet18, googlenet, etc. I found it's easier to get some improvement in shallower models. Have you ever tried deeper networks like pre-resnet164 or resnext164? And have your method get some improvement on cifar10?

yuanli2333 commented 4 years ago

Hi, I don't try pre-resnet164 or resnext164, but you can easily try the models based on the codes.

JiyueWang commented 4 years ago

In your code, I found that the hyperparameter is model dependent. Is the difficulty of hyperparameter tunning stop you from trying more recent networks? Or it's a common issue with KD methods because I found other KD papers also prone to report their results on easier models.

yuanli2333 commented 4 years ago

Yes, hyperparameters are model dependent. You can easily search hyperparameters just by using a python or bash script that pass T and alpha to args in main.py. A simple example is here: https://github.com/peterliht/knowledge-distillation-pytorch/blob/master/search_hyperparams.py But make sure search on validation set splitted from the training set.