lxtGH / CAE

This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"
191 stars 22 forks source link

Nan loss #9

Open LUOBO123LUOBO123 opened 1 year ago

LUOBO123LUOBO123 commented 1 year ago

I change the input resolution to 416*416 when I train custom datasets. When the network is trained for 49 epochs, the print loss is nan.What could be the reason for this?

SelfSup-MIM commented 1 year ago

Hi, there are several tips that may help alleviate the issue.

  1. Decrease learning rate.
  2. Enlarge the drop path ratio.
  3. Decrease clip grad number.
  4. Enlarge warm up epochs.
  5. Use FP32 for Attention block and layernorm, instead of FP16.
  6. Adapt the decay number of AdamW optimizer. The default is betas=(0.9, 0.999) while MAE uses betas=(0.9, 0.95). Maybe it helps improve the stability of training.