The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
Among the transfer learning settings presented in the paper, when the teacher-student pair is ResNet32x4-ShuffleV1, the performance when I reproduce, does not match.
After training student with baseline and vanilla KD, I tried transfer learning on tiny-imagenet with the trained.
However, the baseline did not exceed 33, and the vanilla KD did not exceed 31.
How can I achieve the performance presented in the paper?
Hello, may I ask for transfer learning?
Among the transfer learning settings presented in the paper, when the teacher-student pair is ResNet32x4-ShuffleV1, the performance when I reproduce, does not match. After training student with baseline and vanilla KD, I tried transfer learning on tiny-imagenet with the trained. However, the baseline did not exceed 33, and the vanilla KD did not exceed 31.
How can I achieve the performance presented in the paper?