Train RNN loss goes to NaN

hardmaru / WorldModelsExperiments

World Models Experiments

608 stars 171 forks source link

Train RNN loss goes to NaN #3

Closed zmonoid closed 6 years ago

zmonoid commented 6 years ago

Hi, I am trying to replicate your result in pytorch. I face the problem that loss goes to NaN when training RNN, it's like loss first drop from 2.x to 1.0x then suddenly became NaN. I wonder if you also have faced this problem before as well and sovled, since I notice in your doomrnn.py file, there is an epsilon unused (although in VAE part). I tried to add epsilon to a larger value to Adam optimizer, this will solve the problem, however the loss will drop much more slower. Is there any suggestion from you? Bin.

zmonoid commented 6 years ago

Notice you used gradient clip. Thanks.

zmonoid commented 6 years ago

Still the same problem after adding gradient clip.

hardmaru commented 6 years ago

Hi Zhou Bin

Apologies I’m not too familiar with RNNs in PyTorch

There’s an existing reimplementation of world models carracing experiment in pytorch though:

https://ctallec.github.io/world-models/

Can you check out their code to see if it addresses any differences to what you are doing or gives you any hints?

If you can’t get it to work later on after studying the pytorch repo, I can try to take a look.

On Sun, Jul 8, 2018 at 4:17 AM ZHOU Bin notifications@github.com wrote:

Still the same after adding gradient clip.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hardmaru/WorldModelsExperiments/issues/3#issuecomment-403237385, or mute the thread https://github.com/notifications/unsubscribe-auth/AGBoHk4BXA5MUyl7F4-v1H1afS3wm064ks5uEQkvgaJpZM4VGc_L .

zmonoid commented 6 years ago

Hi David,

Thanks very much for your reply. I have figure it out~~

The reason is that, when log posterior goes to a very negative value, we take exp().sum().log() in pytorch will give negative infinity.

The tensorflow natively added stability trick in tf.reduce_logsumexp as seen in https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/math_ops.py

Suggested by https://en.wikipedia.org/wiki/LogSumExp, here is the stable implementation for pytorch:

def reduce_logsumexp(x, dim=None):
    max_x, _ = x.max(dim=dim, keepdim=True)
    y = (x - max_x).exp().sum(dim=dim).log()
    return max_x + y.squeeze()

hardmaru commented 6 years ago

Hi @zmonoid

That makes sense. LogSumExp makes convergence a lot easier, and in fact this issue has come up when someone was trying to use world models for sonic environments:

https://old.reddit.com/r/MachineLearning/comments/8poc3z/r_blog_post_on_world_models_for_sonic/e0de9z0/?context=3

I hope pytorch will support logsumexp natively soon, as mentioned in the comment thread. If not, please add your implementation as a PR for pytorch!

zmonoid commented 6 years ago

Hi @hardmaru Thanks for your reply~