Open akshay9396 opened 3 months ago
Have you solved it? I'm having the exact same problem
I'm having the exact same problem,too
I'm having the exact same problem,too
I'm having the exact same problem,too
I trained the model on Carvana dataset. Below is the screen shot of training. There i you can clearly see that validation Dice score is constant for 5 epochs and loss is "nan". I used one checkpoint.pth for predication and i got black output. could you please help me to resolved this issue.
Probably the reason is that torch.float16 is used in autocast() when using AMP. So, change torch.float16 to torch.bfloat16. But, if you wanna use torch.float16, you add grad_scaler in init_scale=4096. I solved this issue. I referred to the following sites. This is my first time commenting on github, so sorry if there is something wrong and my poor English. https://qiita.com/takeuchiseijin/items/909c48b57127a37fbd12 https://qiita.com/bowdbeg/items/71c62cf8ef891d164ecd
I trained the model on Carvana dataset. Below is the screen shot of training. There i you can clearly see that validation Dice score is constant for 5 epochs and loss is "nan". I used one checkpoint.pth for predication and i got black output. could you please help me to resolved this issue.