mila-iqia / blocks

A Theano framework for building and training neural networks
Other
1.16k stars 351 forks source link

Adam moment parameterization is backwards #1159

Closed dwf closed 7 years ago

dwf commented 7 years ago

beta1 and beta2 are 1-beta1 and 1-beta2, with respect to how the final version of the paper uses them.

I spent way too much time trying to figure out why shit was breaking for me, and it was because I'd put in the customary default of 0.999 for beta2, which of course meant that, once you flipped it around, my second moment estimates decayed faster than an anxious squirrel on speed.

We should probably change this after the ICLR deadline along with a big mailing list announcement.

rizar commented 7 years ago

I have been always wondering what is the reason behind this reason invertion. Too bad it caused trouble to you. We should totally change it after the deadline.

On Sun, 23 Oct 2016 at 22:54 David Warde-Farley notifications@github.com wrote:

beta1 and beta2 are 1-beta1 and 1-beta2, with respect to how the final version of the paper uses them.

I spent way too much time trying to figure out why shit was breaking for me, and it was because I'd put in the customary default of 0.999 for beta2, which of course meant that, once you flipped it around, my second moment estimates decayed faster than an anxious squirrel on speed.

We should probably change this after the ICLR deadline along with a big mailing list announcement.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/blocks/issues/1159, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8YoBqywtKvvIHEqEXL99KX_RvUKEMks5q3B3jgaJpZM4KeWXU .

dwf commented 7 years ago

The reason for it, I realized, is that they changed it in a version of the manuscript subsequent to the first. The first version specifies the betas the way we do, some subsequent version switches and specifies it the opposite way.

On Tue, Oct 25, 2016 at 2:17 PM, Dzmitry Bahdanau notifications@github.com wrote:

I have been always wondering what is the reason behind this reason invertion. Too bad it caused trouble to you. We should totally change it after the deadline.

On Sun, 23 Oct 2016 at 22:54 David Warde-Farley notifications@github.com wrote:

beta1 and beta2 are 1-beta1 and 1-beta2, with respect to how the final version of the paper uses them.

I spent way too much time trying to figure out why shit was breaking for me, and it was because I'd put in the customary default of 0.999 for beta2, which of course meant that, once you flipped it around, my second moment estimates decayed faster than an anxious squirrel on speed.

We should probably change this after the ICLR deadline along with a big mailing list announcement.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/blocks/issues/1159, or mute the thread https://github.com/notifications/unsubscribe-auth/AAn8YoBqy wtKvvIHEqEXL99KX_RvUKEMks5q3B3jgaJpZM4KeWXU .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mila-udem/blocks/issues/1159#issuecomment-256118846, or mute the thread https://github.com/notifications/unsubscribe-auth/AADrLk1VvlERi0-FliezM8hgKeJtzBBCks5q3kedgaJpZM4KeWXU .