milesial / Pytorch-UNet

PyTorch implementation of the U-Net for image semantic segmentation with high quality images
GNU General Public License v3.0
8.65k stars 2.4k forks source link

The training loss is always nan. #479

Open LuoXubo opened 4 months ago

LuoXubo commented 4 months ago

Hi milesial, thanks for your nice work! However, when I was training the U-Net under the instruction of the README, the training loss is always "nan" and the validation dice score is a very small number, like 8.114e-12. Could you help me solve this problem? Thanks a lot! WeChat06f8244e19325314f8116b1cd45e4771

yuhanc0205 commented 3 months ago

I am experiencing the same issue, I have used all the default settings and the Carvana Dataset, but my loss is always nan and dice score is not changing during training. Did you find any solution ?

binbin395 commented 3 months ago

I found if revert it to the tag v4.0, it's ok, maybe some one can find which commit after that version involved the problem.

benlin1211 commented 2 months ago

I managed to solve this problem by turning off mixed precision flag. That is, instead of using python train.py --amp, use python train.py to train the code. Although it takes more time and memory during training, the code can be trained successfully.

Similar issue: https://github.com/pytorch/pytorch/issues/40497