Open guozhiyao opened 3 years ago
I found the the average grad-norm of my model is about 0.7, which is much smaller than your situation, and make the update of model parameters very slow and can not converge. Do you know how to fix it?
Emmm, I also find the problem!!! The loss cannot go down along with the training process. Can we have a dicuss using QQ? My account is 2667004002.
I train the
swin_tiny_patch4_window7_224
with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.
You may check the same code on a dataset of smaller scale, to fix potential bugs.
fine....I also find this problem....The loss cannot go down even my lr is 1e-7...I do not know how to solve this case...I replace resnet with swin-s in my nets as a new backbone.But my loss can not go down.
My model can converge now. I train the model with softmax loss, and setting warm up iters
and batch size
large can converge normally.
what is the "warm up iters ", I cannot find it in the config. By the way I have the same problem, the loss cannot go down as below:
[2021-12-05 14:30:04 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][80/1895] eta 0:42:20 lr 0.000000194 time 1.4120 (1.3996) loss 9.5960 (9.6270) grad_norm 4.6469 (5.3799) mem 17084MB [2021-12-05 14:30:18 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][90/1895] eta 0:42:04 lr 0.000000211 time 1.3961 (1.3988) loss 9.5745 (9.6272) grad_norm 5.0378 (5.4116) mem 17084MB [2021-12-05 14:30:32 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][100/1895] eta 0:41:48 lr 0.000000227 time 1.3841 (1.3972) loss 9.6274 (9.6305) grad_norm 5.9137 (5.6181) mem 17084MB [2021-12-05 14:30:46 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][110/1895] eta 0:41:41 lr 0.000000244 time 1.3985 (1.4014) loss 9.5727 (9.6291) grad_norm 5.7255 (5.6733) mem 17084MB
@guozhiyao my batch_size=64, my dataset is 14000 class.
Thanks @guozhiyao
@hdmjdp How did you solve this issue?
Mine cannot go down too, for cifar10 with my own framework.
@guozhiyao Hey, I'm wondering what the point it is from the grad_norm. I have seen some people use this metric with their issue about convergence of Swin. Would you please give a hint, thanks.
I train the
swin_tiny_patch4_window7_224
with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.