Open TfeiSong opened 10 months ago
What batch size do you use? It seems failed to converge at all and I suspect it's caused by very small batch size. Can you share your training config file?
https://github.com/yuantianyuan01/StreamMapNet/blob/main/plugin/configs/nusc_newsplit_480_60x30_24e.py Almost the same as above, but only one gpu is used and total train sample numbers is 19291。
So your real batch size is 1/8 compared with using 8 GPUs and it may cause instability. You can try using lower learning rates or larger batch size to solve the problem.
Thank you very much. I'll try right away
During the training phase, the gradient will gradually become large, do you know what causes this ? Looking forward to your reply