megvii-research / mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
808 stars 123 forks source link

The training loss #53

Open Vickeyhw opened 1 year ago

Vickeyhw commented 1 year ago

Thanks for your great work! When I run the code use: python3 tools/train.py --cfg configs/imagenet/r34_r18/dot.yaml
The training loss is much larger than the kd method in the first few epochs, and the test acc is also low, is it normal? 1701193501070

Zzzzz1 commented 1 year ago

The loss scale is too large. Did you change the batch-size or num-gpus?

Vickeyhw commented 1 year ago

@Zzzzz1 I use the original batch size 512 on 8 2080ti. After re-ran the code, I got the following results: 1701241395192 It seems still unstable and much worse than the vannila kd.

JinYu1998 commented 12 months ago

@Vickeyhw How long does it take you to run an epoch please, I find it very strange that it takes me 100 minutes to run a 1/4 Epoch on 8*3090.

Vickeyhw commented 12 months ago

@JinYu1998 23min/epoch.

JinYu1998 commented 12 months ago

@JinYu1998 23min/epoch.

Thanks for your response, I think I've identified the problem. Since my data is not on SSD, the io issue is causing slow training...