The training loss is always nan.

milesial / Pytorch-UNet

PyTorch implementation of the U-Net for image semantic segmentation with high quality images

GNU General Public License v3.0

9.3k stars 2.51k forks source link

The training loss is always nan. #479

Open LuoXubo opened 9 months ago

LuoXubo commented 9 months ago

Hi milesial, thanks for your nice work! However, when I was training the U-Net under the instruction of the README, the training loss is always "nan" and the validation dice score is a very small number, like 8.114e-12. Could you help me solve this problem? Thanks a lot! WeChat06f8244e19325314f8116b1cd45e4771

yuhanc0205 commented 8 months ago

I am experiencing the same issue, I have used all the default settings and the Carvana Dataset, but my loss is always nan and dice score is not changing during training. Did you find any solution ?

binbin395 commented 8 months ago

I found if revert it to the tag v4.0, it's ok, maybe some one can find which commit after that version involved the problem.

benlin1211 commented 7 months ago

I managed to solve this problem by turning off mixed precision flag. That is, instead of using python train.py --amp, use python train.py to train the code. Although it takes more time and memory during training, the code can be trained successfully.

erikbwu commented 2 months ago

Thank you @benlin1211!!