loss exploded - Githubissues

chen-san commented 4 years ago

Hey, buddy, it is a amazing work! I was training the model on Vimeo90k dataset by running ./run.sh. The loss gradually declined as the epoches increased. But after about 10 epoches passed, the loss exploded suddenly without any sign. It printed like this

by

if loss.data.item() > 10.0 * LOSS_0: print(max(p.grad.data.abs().max() for p in model.parameters())) continue

And the generated test image

Why the loss exploded suddenly? How can I avoid it?

Thx!

myungsub commented 4 years ago

Hi, thanks for your interest in our work.

Current code does gradient clipping and I also added that code snippet you mentioned to skip the iterations when the loss seems like it's going to explode, but this problem happens once in a while, and I'm also having a hard time analyzing this error since it is not reproducible.

For now, I think it should be fine (most of the time) if you just resume training from model_best.pth. Also, training becomes more stable if you start from a smaller learning rate.

Hope this answer helps you train the model successfully. I'll add to this issue if I find the exact reason for sudden loss explosions.

chen-san commented 4 years ago

Thanks for your answering!

myungsub / CAIN

loss exploded #2