microsoft / CvT

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.
MIT License
533 stars 120 forks source link

NAN loss #4

Closed tzt101 closed 3 years ago

tzt101 commented 3 years ago

Hi, I just trained cvt13-224 model with the default settings, but got NAN loss after several epochs. Does anyone have trained this model sucessfully? 图片

leoxiaobin commented 3 years ago

hi, @tzt101,

Could you paste the printed configuration for your job?

tzt101 commented 3 years ago

This is the configuration, I just keep the default settings. AMP: ENABLED: true MEMORY_FORMAT: nchw AUG: COLOR_JITTER:

leoxiaobin commented 3 years ago

it seems that you are using a larger LR. If you specify BATCH_SIZE_PER_GPU to 128, you should specify LR to 0.000125. The LR in our config is with respect to BATCH_SIZE_PER_GPU. You are using a much larger LR than our original config. I guess that's the reason you got NaN error.

tzt101 commented 3 years ago

Thank you very much! I will try to use small lr later.