thlstsul / tensorflow-examples-tutorials-mnist

0 stars 4 forks source link

[Numerical Bug] Loss may be NAN during training #1

Open Justobe opened 4 years ago

Justobe commented 4 years ago

Hi~

Thank you very much for sharing this code!

However, I found that loss may be NAN after some iterations (67241 iterations in my laptop). After carefully checking the code, I found that the following code may trigger NAN in loss: In mnist_softmax.py :

cross_entropy = -tf.reduce_sum(y_ * tf.log(y))

If y contains 0 (output of softmax ), the result of tf.log(y) is inf (log(0) is illegal), and this may cause the result of loss to become NAN.

It could be fixed by making the following changes:

cross_entropy = -tf.reduce_sum(y_ * tf.log(y + 1e-10))

or

cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y,1e-10,1.0)))

Hope to hear from you. Thanks in advance! : )

Justobe commented 3 years ago

@thelostsoul5 :)