Closed JiyueWang closed 4 years ago
Hi, I don't try pre-resnet164 or resnext164, but you can easily try the models based on the codes.
In your code, I found that the hyperparameter is model dependent. Is the difficulty of hyperparameter tunning stop you from trying more recent networks? Or it's a common issue with KD methods because I found other KD papers also prone to report their results on easier models.
Yes, hyperparameters are model dependent. You can easily search hyperparameters just by using a python or bash script that pass T and alpha to args in main.py. A simple example is here: https://github.com/peterliht/knowledge-distillation-pytorch/blob/master/search_hyperparams.py But make sure search on validation set splitted from the training set.
In the paper, the reported results are shallow in general, resnet18, googlenet, etc. I found it's easier to get some improvement in shallower models. Have you ever tried deeper networks like pre-resnet164 or resnext164? And have your method get some improvement on cifar10?