Closed mmreza79 closed 5 years ago
Loss turning to be NaN maybe an indication of exploding gradients, you may try gradient checking. When I was working on this, as far as I can recall, the model converged by ~1000 epochs. (give or take a few 100 epochs)
Check this link for a list of possible solutions to NaN loss.
Thank you @Sayan98 for your simple coding. I started to run your code as it is. Right now it finished 1600 epochs out of your define 6000 epochs but still I am getting "loss = Nan". Form which epoch I may get any letter loss ? By the way, I skipped "--checkpoint /home/SharedData/intern_sayan/PascalVOC2012/model_best.pth" caused I don't have and saved model.