xuguodong03 / SSKD

[ECCV2020] Knowledge Distillation Meets Self-Supervision
234 stars 47 forks source link

meets problem when train ssp head #5

Open UcanSee opened 4 years ago

UcanSee commented 4 years ago

Firstly I trained a teacher model and its accuracy is correct, then I train ssp head of teacher model, but I found loss of ssp head falling slowly. the initial loss is 3.37 at the start of training, and falls to 3.25 at the end of training. Did I make something wrong? dataset is ImageNet, and training config is consistent to that in student.py.

larry10hhobh commented 4 years ago

maybe the code of train ssp head is wrong? you can read my issue.

xuguodong03 commented 4 years ago

The training hyper-parameters (e.g. batchsize, epoch, LR) of CIFAR and ImageNet are different. For ImageNet, we use the hyper-parameters in pytorch/example.

Besides the hyper-parameters, the reason that ssp loss does not fall may be that the backbone of teacher is fixed. The trainable module contains only a 2-layer FC. As stated in the paper, the self-supervision may be not accurate, but it still transfer some structured information. So maybe you could try continuing the experiments and see the results.