Closed tzt101 closed 3 years ago
hi, @tzt101,
Could you paste the printed configuration for your job?
This is the configuration, I just keep the default settings. AMP: ENABLED: true MEMORY_FORMAT: nchw AUG: COLOR_JITTER:
it seems that you are using a larger LR. If you specify BATCH_SIZE_PER_GPU to 128, you should specify LR to 0.000125. The LR in our config is with respect to BATCH_SIZE_PER_GPU. You are using a much larger LR than our original config. I guess that's the reason you got NaN error.
Thank you very much! I will try to use small lr later.
Hi, I just trained cvt13-224 model with the default settings, but got NAN loss after several epochs. Does anyone have trained this model sucessfully?![图片](https://user-images.githubusercontent.com/26423710/120740299-89b90d00-c525-11eb-9c88-43a3327557bb.png)