megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
807 stars 123 forks source link

code question #43

Open Aukk123 opened 1 year ago

Aukk123 commented 1 year ago

When I loaded my own training set for distillation training, the normal training was performed during the first training, the loss decreased normally, and the model could gradually fit. But when I train for the second time, if I change the number of iterations or the decline stage of the loss function, it will fail to fit. The verification TOP-1 is always 0.98, and the loss cannot be reduced normally. Can you solve this problem? Thank you so much

Zzzzz1 commented 1 year ago

The hyper-parameters may be different among different datasets. Tuning learning rates(lr) and weight decay(wd) may help.