The official implementation of [CVPR2022] Decoupled Knowledge Distillation and [ICCV2023] DOT: A Distillation-Oriented Trainer
When I loaded my own training set for distillation training, the normal training was performed during the first training, the loss decreased normally, and the model could gradually fit. But when I train for the second time, if I change the number of iterations or the decline stage of the loss function, it will fail to fit. The verification TOP-1 is always 0.98, and the loss cannot be reduced normally. Can you solve this problem? Thank you so much
When I loaded my own training set for distillation training, the normal training was performed during the first training, the loss decreased normally, and the model could gradually fit. But when I train for the second time, if I change the number of iterations or the decline stage of the loss function, it will fail to fit. The verification TOP-1 is always 0.98, and the loss cannot be reduced normally. Can you solve this problem? Thank you so much