why it shows ''loss_total_val=NaN''

jiangsutx / SRN-Deblur

Repository for Scale-recurrent Network for Deep Image Deblurring

http://www.xtao.website/projects/srndeblur/srndeblur_cvpr18.pdf

MIT License

714 stars 184 forks source link

why it shows ''loss_total_val=NaN'' #7

Closed Akumio closed 6 years ago

Akumio commented 6 years ago

I want to know why it shows up ''AssertionError: Model diverged with loss = NaN’' when I run the code.I have checked the code and try to revise it however it still shows ''loss_total_val=NaN''.Could you tell me how to handle it?plz.

jiangsutx commented 6 years ago

I did not come across this problem. NaN is usually due to divided by 0, or out of range. Please check if images are correctly loaded.

Akumio commented 6 years ago

thank you for yout reply ！ i used python3.6 before，but the requires 2.7，so it maybe the python version problem？and my operation system is win10 ，is your system ubuntu？ As you mentioned，i prepared the dataset as follow as your grpolist ，but i cant understand the loadimage code clearly。 And when i revise the loss，it could run several steps then still show the NaNerror，i am cofused about it。 In all，thank you very much！The deblurring result is the best i have occured，i will try it again。

jiangsutx commented 6 years ago

I believe it should work well on python2 and 3, and also work well on win10. Once the loss goes to NaN, it can never go back to normal.

You can check the input and output of the first iteration, by using sess.run() to get their values.

Akumio commented 6 years ago

Sorry to trouble you again。I try my best to find the error，but it still cant work。I want to know if the picture is successfully imported into it, but I don't know how to do it. Could you please help me see where there is a problem?Thank you very much! qr3 1o8hg 4v9qe vdvlm0

jiangsutx commented 6 years ago

We have tested again and do not produce NaN. You can use sess.run() on input tensor to get its value to see images are loaded.

Akumio commented 6 years ago

I'm gonna import the document again into the dataset or show NaN, I have got the shape of image_in and image_gt.Forgive me for not knowing which input tensor to sess.run(). As I showed above, is my dataset imported correctly? Please help me. I've been bothering it for days.

Akumio commented 6 years ago

I found that in another case, when I ran with the CPU, the loss value could be displayed normally, but it was too slow. I think the input should be imported correctly.I really don't know how to solve the NaN problem of loss value when running with GPU.

firenxygao commented 6 years ago

@Akumio We also cannot figure out why NaN can happen. Really sorry about that.

lookway commented 5 years ago

@Akumio Hi ,I want to know how to make a file named "datalist_gopro.txt", which is very important to me. Thank you very much for replying to me.

Texaser commented 2 years ago

@Akumio We also cannot figure out why NaN can happen. Really sorry about that.

Could you plz tell me how to figure this problem out? I also get trapped in this issue. Thanks for your kind reply!