say4n / pytorch-segnet

SegNet implementation in Pytorch framework
MIT License
88 stars 35 forks source link

Is your code train well? #1

Closed mmreza79 closed 5 years ago

mmreza79 commented 5 years ago

Thank you @Sayan98 for your simple coding. I started to run your code as it is. Right now it finished 1600 epochs out of your define 6000 epochs but still I am getting "loss = Nan". Form which epoch I may get any letter loss ? By the way, I skipped "--checkpoint /home/SharedData/intern_sayan/PascalVOC2012/model_best.pth" caused I don't have and saved model.

say4n commented 5 years ago

Loss turning to be NaN maybe an indication of exploding gradients, you may try gradient checking. When I was working on this, as far as I can recall, the model converged by ~1000 epochs. (give or take a few 100 epochs)

Check this link for a list of possible solutions to NaN loss.