Loss Nan - Githubissues

0merjavaid commented 6 years ago

I am getting train loss nan at all steps i'm training it on my own dataset but getting this from start of train.

sagieppel commented 6 years ago

1) Have you change the number of classes parameter (NUM_CLASSES) to the number of classes in your dataset? 2)Did you check that you dont have nan or inf in your input?

0merjavaid commented 6 years ago

Okay so in my case i have images of retina and I am trying to segment vessels. so i changed my classes to 1. further more my input is just 7 images i am trying to overfit for now. Sample attached.

image_01l image_01l_1stho

0merjavaid commented 6 years ago

I changed the loss function from sparse softmax to simple softmax Loss = tf.reduce_mean((tf.nn.softmax_cross_entropy_with_logits(labels=tf.squeeze(GTLabel, squeeze_dims=[3]),logits=tf.squeeze(Net.Prob,squeeze_dims=[3]),name="loss")))

so it started giving me value in loss. but loss start to increase instead of decreasing What is wrong that i am doing?

sagieppel commented 6 years ago

These values are very very high and Sparse should have worked fine so I suspect you have some problem with the data. This might help https://stackoverflow.com/questions/40050397/tensorflow-nan-loss-reasons Also training such net with 7 examples is a complete waste of times, you need minimum several hundreds of examples.

0merjavaid commented 6 years ago

well the reason i am using 7 example is to test the net. to make it overfit on these example so that i train on all data.

0merjavaid commented 6 years ago

since you suspect images my images are RGB jpegs and ground truth are pngs vessels =255(white) background = 0(black)

sagieppel commented 6 years ago

That the problem. If you have two classes your values should be 0,1. Value of 255 implies 255 classes.

0merjavaid commented 6 years ago

yeah i downloaded your dataset and now i got that i have changed it to 0 and 1 lets see if it works now

0merjavaid commented 6 years ago

Hats off your code rocks, it worked Thanks

Only one concern left. dont you think that if you have used tensorflow for augmentation or opencv instead of scipy it would have been much faster? Since i have felt it myself and seen the results that opencv and tensorflow are faster than scipy https://www.kaggle.com/zfturbo/test-speed-cv2-vs-scipy-vs-tensorflow

if you want to modify then i can work on it and generate a pull request to speed up operations.

sagieppel commented 6 years ago

Cool, glad to hear. The image loading/augmentation time is usually negligible compared to the backprop time, but feel free to add any improvement, that what git for.

leishi2018 commented 5 years ago

Hats off your code rocks, it worked Thanks

Only one concern left. dont you think that if you have used tensorflow for augmentation or opencv instead of scipy it would have been much faster? Since i have felt it myself and seen the results that opencv and tensorflow are faster than scipy https://www.kaggle.com/zfturbo/test-speed-cv2-vs-scipy-vs-tensorflow

if you want to modify then i can work on it and generate a pull request to speed up operations.

Can you tell me how to solve this problem, my loss is nan and I have change the white 1 and black 0. and I use the loss function you provide my loss is also very huge and add.

sagieppel commented 5 years ago

Have you change the num_classes to 2? You can send me single image and label map and i will have a look.

leishi2018 commented 5 years ago

yes, I try change num_class 2, but it doesnt work. this the test photo

this is label photo

------------------ 原始邮件 ------------------ 发件人: "sagieppel"notifications@github.com; 发送时间: 2018年12月18日(星期二) 晚上9:13 收件人: "sagieppel/Fully-convolutional-neural-network-FCN-for-semantic-segmentation-Tensorflow-implementation"Fully-convolutional-neural-network-FCN-for-semantic-segmentation-Tensorflow-implementation@noreply.github.com; 抄送: "火焰舞者"523997174@qq.com; "Comment"comment@noreply.github.com; 主题: Re:[sagieppel/Fully-convolutional-neural-network-FCN-for-semantic-segmentation-Tensorflow-implementation]Loss Nan (#1)

Have you change the num_classes to 2? You can send me single image and label map and i will have a look.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sagieppel / Fully-convolutional-neural-network-FCN-for-semantic-segmentation-Tensorflow-implementation

Loss Nan #1