kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
468 stars 123 forks source link

Restoring optimizer state in load_checkpoint? #8

Closed ipod825 closed 6 years ago

ipod825 commented 6 years ago

In load_checkpoint function, shouldn't you also load the optimizer state?

kevinzakka commented 6 years ago

@ipod825 I'm using plain SGD with momentum, and I'm tracking the learning rate which is getting reduced when the validation loss plateaus. I wasn't aware I still needed to save the optim state. Any reason why?

ipod825 commented 6 years ago

Oh. Never mind. I wasn't so sure about how momentum works. You surely don't need to store other info. That said, it it is still better to store the full state or at least have a comment in case others want to play with your code and try another optimizer.

kevinzakka commented 6 years ago

@ipod825 Totally agree, I'll update the code and README to reflect that :) Thank you!

kevinzakka commented 6 years ago

@ipod825 so it turns out momentum update relies on the previous velocity value and I'd completely forgotten about it. So yes, you're absolutely right that I should save the optimizer dict state and I've updated the code accordingly. Thanks for pointing it out 😃