Open Vickeyhw opened 1 year ago
The loss scale is too large. Did you change the batch-size or num-gpus?
@Zzzzz1 I use the original batch size 512 on 8 2080ti. After re-ran the code, I got the following results: It seems still unstable and much worse than the vannila kd.
@Vickeyhw How long does it take you to run an epoch please, I find it very strange that it takes me 100 minutes to run a 1/4 Epoch on 8*3090.
@JinYu1998 23min/epoch.
@JinYu1998 23min/epoch.
Thanks for your response, I think I've identified the problem. Since my data is not on SSD, the io issue is causing slow training...
Thanks for your great work! When I run the code use:
python3 tools/train.py --cfg configs/imagenet/r34_r18/dot.yaml
The training loss is much larger than the kd method in the first few epochs, and the test acc is also low, is it normal?