macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

Using LR decay with Adam #40

Closed vikram-gupta closed 8 years ago

vikram-gupta commented 8 years ago

Hi @macournoyer

Do we need to use the LR decay along with Adam? I guess Adam would already scale the learning rate as it will be accumulating the gradients ?

This is just my intuition.. was wondering if you tried both the cases and came to some conclusion?

macournoyer commented 8 years ago

I was not familiar w/ Adam before. So I uncommented the LR decay code that was left there. Perhaps that was an error...

And maybe this explains the weirds results we've been getting lately...

Thx for pointing this out.

vikram-gupta commented 8 years ago

@macournoyer

It may explain the issue. However, i am not seeing any decay in the learning rate while training. Even after 35 epochs, the learning rate that gets printed on console comes out to be 0.001.

macournoyer commented 8 years ago

I think there was a bug w/ this too, fixed in #47.