megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
781 stars 118 forks source link

The results cannot be replicated #62

Open Yintel12138 opened 6 months ago

Yintel12138 commented 6 months ago

Title: Unable to Reproduce DKD Experiment Results on Tesla T4 Server Using Repository Code Body:Dear maintainers, I recently attempted to replicate the experiment results of Distilled Knowledge Distillation (DKD) mentioned in the paper, using your repository's code on my Tesla T4 server. Unfortunately, I was not able to achieve the same results as documented. Could you please advise if there are any specific configurations or steps that I might have missed? Here is what I have done so far: Set up the environment as per the documentation. Pulled the latest code from the master branch of the repository. Followed the instructions in the README file to set up the DKD experiment. Ran the experiment with the default settings provided. However, the results were significantly different from those reported in the paper. I would appreciate any guidance or recommendations to address this issue. Thank you for your time and assistance. Best regards, Yintel image image image

Zzzzz1 commented 6 months ago

Did you run the experiment on 8 GPUs? The batch-size on each GPU could be very small if running on 8 GPUs. The results reported for CIFAR-100 is run on only 1 GPU.

Yintel12138 commented 6 months ago

Thank You. I will try it on only one gpu. Another question is, if I want to set a large batch size for boosting training speed, learning rate should increase or decrease?

aaab8b commented 1 week ago

Thank You. I will try it on only one gpu. Another question is, if I want to set a large batch size for boosting training speed, learning rate should increase or decrease?

Increase the learning rate.