Open MISTCARRYYOU opened 2 years ago
To be specific, this issue comes from the situation that the baseline model (rollout) is the best, but the model is getting worse. I don't know why the backpropagation did not reduce this gap but increased this gap.
@MISTCARRYYOU Have you solved this problem? In epoch 0, the mode is trained correctly. However, when epoch 1 started after the baseline is evaluated and updated, the grad_norm become 0.0 and the model became much worser. Anyone has idea on about it?
Hello Kool:
I am applying your codes for my paper's experiments, but I encountered a curious training result when I trained the AM.
The loss value decreases continuously, which makes me happy; however, the loss value passed zero value and continued to decline to negative infinity after 100 epochs. (By the way, I also encountered this issue before when I trained a GAN model. )
So I wonder what I can do to improve the training process. (ps: I don't change the loss function and the training codes.)
Thank you for your consideration!