I change the input resolution to 416*416 when I train custom datasets. When the network is trained for 49 epochs, the print loss is nan.What could be the reason for this?
Hi, there are several tips that may help alleviate the issue.
Decrease learning rate.
Enlarge the drop path ratio.
Decrease clip grad number.
Enlarge warm up epochs.
Use FP32 for Attention block and layernorm, instead of FP16.
Adapt the decay number of AdamW optimizer. The default is betas=(0.9, 0.999) while MAE uses betas=(0.9, 0.95). Maybe it helps improve the stability of training.
I change the input resolution to 416*416 when I train custom datasets. When the network is trained for 49 epochs, the print loss is nan.What could be the reason for this?