Gets into the NaN loss value after some epochs!!

wasidennis / AdaptSegNet

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

850 stars 203 forks source link

Gets into the NaN loss value after some epochs!! #54

Closed imtiazziko closed 5 years ago

imtiazziko commented 5 years ago

Hello @wasidennis

I get into the NaN loss value while the training is going on from epoch 12396. There can be the instability of the adversarial or learning rate or else but not sure which one of it. I used the same setting for single level DeepLab model as per your code. What can be the problem here?

issue_adaptseg

Also how did you decide on the early stopping epoch number 149999?
I really can not reproduce the result of 41.4 % reported in the paper (GTA5-Cityscapes). Can you help me on that?

Thank you very much. Looking forward to your answers.

wasidennis commented 5 years ago

We have not experienced any issue like your NaN case. It seems that our code has been modified based on your screenshot. Could you double check whether there are any differences from our original code or provide more information about this issue? Thanks!

imtiazziko commented 5 years ago

Thanks for replying. But I did not change your code. I think you might be able to see this issue by printing the logs like I did.

wasidennis commented 5 years ago

@imtiazziko In the original implementation, we are also printing all the losses, but we never have this issue before in any experiments. It also looks like the segmentation loss is really hight before going to NaN, which is not normal. Could you elaborate more details, e.g., training hyper-parameters, which pre-trained weights are used?