megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
772 stars 117 forks source link

Transfer learning #59

Open jekim5418 opened 5 months ago

jekim5418 commented 5 months ago

Hello, may I ask for transfer learning?

Among the transfer learning settings presented in the paper, when the teacher-student pair is ResNet32x4-ShuffleV1, the performance when I reproduce, does not match. After training student with baseline and vanilla KD, I tried transfer learning on tiny-imagenet with the trained. However, the baseline did not exceed 33, and the vanilla KD did not exceed 31.

How can I achieve the performance presented in the paper?

Zzzzz1 commented 5 months ago

Follow the setting reported on the paper. In particular, the weight decay should be set as 0.0 and the backbone(and the BN modules) should be fixed.