megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
807 stars 123 forks source link

Reproduce Issue for WRN-40-2 / WRN40-1 on cifar-100 #51

Open yuqizhu opened 1 year ago

yuqizhu commented 1 year ago

Hi, I tried to reproduce WRN40-2/WRN40-1 result on cifar-100 but I can only got up to 73.3, which is 1.5 lower than the reported result on paper. I used the original yaml file and didn't change any hyper-parameter. The other experiments I tried on cifar100 were always within 0.5 of the reported number, which looks fine for me. But 1.6 on WRN40-2/WRN40-1 seems a bit too large.

Zzzzz1 commented 1 year ago

See issue #8. It seems the results of the WRN series are not stable. Are results of other teacher-student pairs also lower?

yuqizhu commented 1 year ago

Others are little bit lower, about 0.2~0.5.