microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.62k stars 220 forks source link

Loss Nan for AutoFormer Base Model #143

Closed rehulisw closed 1 year ago

rehulisw commented 1 year ago

Thanks for your work! I try to reproduce the base model for Autoformer, but met the problem that loss might be nan during 200th ~ 300th epochs. Do you have any idea to solve this problem?

wkcn commented 1 year ago

Hi @rehulisw , thanks for your attention to our work!

You can try to disable AMP by adding the argument --no-amp, then resume the checkpoint in around the epoch 200th.