y0ast / VAE-Torch

Implementation of Variational Auto-Encoder in Torch7
MIT License
267 stars 62 forks source link

config for Adam #2

Closed AjayTalati closed 9 years ago

AjayTalati commented 9 years ago

I'm trying to use the implementation of the adam optimizer available in the optim package with the line

-- not used -- x, batchlowerbound = optim.adagrad(opfunc, parameters, config, state) x, batchlowerbound = optim.adam(opfunc, parameters, config, state)

It does not seem to converge or even change much with any of the configurations I've tried.

Could you suggest a config which works please?

Thank you.

AjayTalati commented 9 years ago

Update - the adam optimizer was fixed yesterday - it works now, with default parameters on the Rosenbrock test problem.

I'm testing it now to see if gives an improvement?

y0ast commented 9 years ago

Great! Curious to see your result.

AjayTalati commented 9 years ago

Hi Joost,

unfortunately I have'nt managed to get any convergence yet with adam, over a range of different config parameters?

Looking at figure 4 a) of the adam paper, it seems that

beta_1 = 0.1 beta_2 = 0.0001 alpha = 0.002 epsilon = 1e-8 lambda = 1 - 1e-8

with the model they state,

dim_hidden = 50 hidden_units_encoder = 500 hidden_units_decoder = 500

should get convergence after after 10 epochs, but I can't reproduce their results?

y0ast commented 9 years ago

Clearly the result is much better after 100 epochs (4 b) so those figures do not indicate convergence. It shows the value of the negLL after a set amount of epochs for different learning rates (x-axis) and illustrates the necessity of the bias-correction factor.

I am not sure exactly how many epochs are necessary with Adam, I never tested that.

AjayTalati commented 9 years ago

Hi, yes sorry I was sloppy with my language, applogies.

Basically I've tried all the grid points they mention in their paper, i.e. the bias correction terms beta_1 and beta_2 and learning rate - but I still get the negLL after 10 epochs to be about

-4 e+155

i.e. basically -infinity. Maybe you want to give it try? If you pull the latest optim module

luarocks install optim

I think its then just a task of

i) changing your while loop to a for loop, running for 10 epochs, and writing the last negLL to a table

ii) Constructing a small grid and iterating your above code over the grid points.

If you really want to be fancy, there's a Bayesian optimization pakage on git called spearmint, which uses a fancy iterative Gaussian process scheme for continuous hyper-parameter optimization - its coded in python. I'm trying it now.

I'm guessing if we both can't get it to work, it might be a problem with the adam.lua code?

AjayTalati commented 9 years ago

I'm not sure the adam.lua code in the optim package works.

Maybe it's best to try to use Dirk Kingma's adam implementation here,


with your theano implementation?

y0ast commented 9 years ago

I have tried adam before (in Theano) and it works nicely, so this does indeed sound like a bug, either in my code or in the adam implementation.

I will take a look at it tomorrow.

y0ast commented 9 years ago

Fixed by setting negative learning rate (gradient ascent)