Closed AjayTalati closed 9 years ago
Update - the adam optimizer was fixed yesterday - it works now, with default parameters on the Rosenbrock test problem.
I'm testing it now to see if gives an improvement?
Great! Curious to see your result.
Hi Joost,
unfortunately I have'nt managed to get any convergence yet with adam, over a range of different config parameters?
Looking at figure 4 a) of the adam paper, it seems that
beta_1 = 0.1 beta_2 = 0.0001 alpha = 0.002 epsilon = 1e-8 lambda = 1 - 1e-8
with the model they state,
dim_hidden = 50 hidden_units_encoder = 500 hidden_units_decoder = 500
should get convergence after after 10 epochs, but I can't reproduce their results?
Clearly the result is much better after 100 epochs (4 b) so those figures do not indicate convergence. It shows the value of the negLL after a set amount of epochs for different learning rates (x-axis) and illustrates the necessity of the bias-correction factor.
I am not sure exactly how many epochs are necessary with Adam, I never tested that.
Hi, yes sorry I was sloppy with my language, applogies.
Basically I've tried all the grid points they mention in their paper, i.e. the bias correction terms beta_1 and beta_2 and learning rate - but I still get the negLL after 10 epochs to be about
-4 e+155
i.e. basically -infinity. Maybe you want to give it try? If you pull the latest optim
module
luarocks install optim
I think its then just a task of
i) changing your while loop to a for loop, running for 10 epochs, and writing the last negLL to a table
ii) Constructing a small grid and iterating your above code over the grid points.
If you really want to be fancy, there's a Bayesian optimization pakage on git called spearmint, which uses a fancy iterative Gaussian process scheme for continuous hyper-parameter optimization - its coded in python. I'm trying it now.
I'm guessing if we both can't get it to work, it might be a problem with the adam.lua
code?
I'm not sure the adam.lua
code in the optim
package works.
Maybe it's best to try to use Dirk Kingma's adam implementation here,
https://github.com/dpkingma/nips14-ssl/blob/master/adam.py
with your theano implementation?
I have tried adam before (in Theano) and it works nicely, so this does indeed sound like a bug, either in my code or in the adam implementation.
I will take a look at it tomorrow.
Fixed by setting negative learning rate (gradient ascent)
I'm trying to use the implementation of the adam optimizer available in the
optim
package with the line-- not used -- x, batchlowerbound = optim.adagrad(opfunc, parameters, config, state)
x, batchlowerbound = optim.adam(opfunc, parameters, config, state)
It does not seem to converge or even change much with any of the configurations I've tried.
Could you suggest a
config
which works please?Thank you.