Closed vikram-gupta closed 8 years ago
I was not familiar w/ Adam before. So I uncommented the LR decay code that was left there. Perhaps that was an error...
And maybe this explains the weirds results we've been getting lately...
Thx for pointing this out.
@macournoyer
It may explain the issue. However, i am not seeing any decay in the learning rate while training. Even after 35 epochs, the learning rate that gets printed on console comes out to be 0.001.
I think there was a bug w/ this too, fixed in #47.
Hi @macournoyer
Do we need to use the LR decay along with Adam? I guess Adam would already scale the learning rate as it will be accumulating the gradients ?
This is just my intuition.. was wondering if you tried both the cases and came to some conclusion?