config for Adam - Githubissues

y0ast / VAE-Torch

Implementation of Variational Auto-Encoder in Torch7

MIT License

267 stars 62 forks source link

config for Adam #2

Closed AjayTalati closed 9 years ago

AjayTalati commented 9 years ago

I'm trying to use the implementation of the adam optimizer available in the optim package with the line

-- not used -- x, batchlowerbound = optim.adagrad(opfunc, parameters, config, state) x, batchlowerbound = optim.adam(opfunc, parameters, config, state)

It does not seem to converge or even change much with any of the configurations I've tried.

Could you suggest a config which works please?

Thank you.

AjayTalati commented 9 years ago

Update - the adam optimizer was fixed yesterday - it works now, with default parameters on the Rosenbrock test problem.

I'm testing it now to see if gives an improvement?

y0ast commented 9 years ago

Great! Curious to see your result.

AjayTalati commented 9 years ago

Hi Joost,

unfortunately I have'nt managed to get any convergence yet with adam, over a range of different config parameters?

Looking at figure 4 a) of the adam paper, it seems that

beta_1 = 0.1 beta_2 = 0.0001 alpha = 0.002 epsilon = 1e-8 lambda = 1 - 1e-8

with the model they state,

dim_hidden = 50 hidden_units_encoder = 500 hidden_units_decoder = 500

should get convergence after after 10 epochs, but I can't reproduce their results?

y0ast commented 9 years ago

Clearly the result is much better after 100 epochs (4 b) so those figures do not indicate convergence. It shows the value of the negLL after a set amount of epochs for different learning rates (x-axis) and illustrates the necessity of the bias-correction factor.

I am not sure exactly how many epochs are necessary with Adam, I never tested that.

AjayTalati commented 9 years ago

Hi, yes sorry I was sloppy with my language, applogies.

Basically I've tried all the grid points they mention in their paper, i.e. the bias correction terms beta_1 and beta_2 and learning rate - but I still get the negLL after 10 epochs to be about

-4 e+155

i.e. basically -infinity. Maybe you want to give it try? If you pull the latest optim module

luarocks install optim

I think its then just a task of

i) changing your while loop to a for loop, running for 10 epochs, and writing the last negLL to a table

ii) Constructing a small grid and iterating your above code over the grid points.

If you really want to be fancy, there's a Bayesian optimization pakage on git called spearmint, which uses a fancy iterative Gaussian process scheme for continuous hyper-parameter optimization - its coded in python. I'm trying it now.

I'm guessing if we both can't get it to work, it might be a problem with the adam.lua code?

AjayTalati commented 9 years ago

I'm not sure the adam.lua code in the optim package works.

Maybe it's best to try to use Dirk Kingma's adam implementation here,

https://github.com/dpkingma/nips14-ssl/blob/master/adam.py

with your theano implementation?

y0ast commented 9 years ago

I have tried adam before (in Theano) and it works nicely, so this does indeed sound like a bug, either in my code or in the adam implementation.

I will take a look at it tomorrow.

y0ast commented 9 years ago

Fixed by setting negative learning rate (gradient ascent)