In optim.sgd, weight decay is not subject to learning rate decay

koraykv / optim

Some optimization packages for torch7

11 stars 5 forks source link

In optim.sgd, weight decay is not subject to learning rate decay #2

Closed rolfe closed 11 years ago

rolfe commented 11 years ago

Weight decay uses the undecayed learning rate. As a result, as training progresses, the weight decay is effectively given more and more emphasis. This does not seem correct.

clementfarabet commented 11 years ago

Good point. Do you want me to change that? It makes sense to me.

clementfarabet commented 11 years ago

It was actually still incorrect. I've just submitted a new change, it should be correct now.

For luarocks users, I've created a new version, so that older experiments can still be reproduced.