The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
When I loaded my own training set for distillation training, the normal training was performed during the first training, the loss decreased normally, and the model could gradually fit. But when I train for the second time, if I change the number of iterations or the decline stage of the loss function, it will fail to fit. The verification TOP-1 is always 0.98, and the loss cannot be reduced normally. Can you solve this problem? Thank you so much
When I loaded my own training set for distillation training, the normal training was performed during the first training, the loss decreased normally, and the model could gradually fit. But when I train for the second time, if I change the number of iterations or the decline stage of the loss function, it will fail to fit. The verification TOP-1 is always 0.98, and the loss cannot be reduced normally. Can you solve this problem? Thank you so much