wellecks / vaes

Variational Autoencoders & Normalizing Flows Project
19 stars 10 forks source link

ELBO function #9

Open erlebach opened 7 years ago

erlebach commented 7 years ago

Hi,

The Elbo function returns monitor_functions, a dictionary of elements to monitor.

In train.py and evaluation.py, you call elbo_loss, which returns monitor_functions. So far so good. In train.py, you call optimizer.minimize(loss_op), where loss_op is the return value to the elbow function (line 259 in train.py). minimize() should take the function to be minimized as argument.

Perhaps there is a better explanation for how the code is written since it is unlikely you could get the code to work if this is an error.

I just realized that the code calls train() and not train_simple(). The issue I mention above is in train_simple(). I assume it is an error?

Thank you.

wellecks commented 7 years ago

We ended up using the train() function for the experiments, and the train_simple wasn't updated as the code evolved, so train_simple probably doesn't work right now. In train(), the training loss is extracted from monitor_functions, then turned into train_op and passed into sess.run on line 150. Does that help?

erlebach commented 7 years ago

Yes, it helps. Thank you for replying. Just so that you know, I am analyzing your code. I converted it to run in anaconda3 (Python 3.6 + Tensorflow 1.1 (or 1.2: do not recall)). The conversion was very straightforward.

One additional question and a comment: Question: you have several encoders, but only a basic decoder. Regarding duplication of Rezende's results: Rezende uses a maxout nonlinearity, whereas you use a tanh nonlinearity. Might this account for differences in your results and his? You could use elu or Relu (why don't you?) Also, haven't some authors used convolutional networks to improve on the results?

thanks again. Gordon.

wellecks commented 7 years ago

Regarding the multiple encoders, we were trying to measure the performance difference between using a simple encoder and three types of normalizing flows ('residual', householder, inverse autoregressive), while keeping everything else constant. Introducing the flows only affects the encoder, so we just keep the decoder the same for all four cases.

Regarding tanh, we followed the settings found in version 1 of the Householder flows paper since it was fairly comprehensive (they used tanh there). Since then this paper's been updated, but we weren't able to get the results reported in version 1

erlebach commented 7 years ago

Thanks for the update. It is very sad that so many paper do not provide enough information for the duplication of results. I am interested in your code because we are also working to compare these different methods (IAF and NF, but with a view to applying them to the prior as opposed to the posterior.)

I noticed that you not using conv_net? Is this module debugged?

On Tue, May 30, 2017 at 9:35 AM, Sean Welleck notifications@github.com wrote:

Regarding the multiple encoders, we were trying to measure the performance difference between using a simple encoder and three types of normalizing flows ('residual', householder, inverse autoregressive), while keeping everything else constant. Introducing the flows only affects the encoder, so we just keep the decoder the same for all four cases.

Regarding tanh, we followed the settings found in version 1 of the Householder flows paper https://arxiv.org/pdf/1611.09630v1.pdf since it was fairly comprehensive (they used tanh there). Since then this paper's been updated, but we weren't able to get the results reported in version 1

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wellecks/vaes/issues/9#issuecomment-304879832, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT0ZAKBDj4zIW42IzfeHff4zKmRpHNOks5r_BsOgaJpZM4Nozk_ .

-- Gordon Erlebacher Chair, Department of Scientific Computing

wellecks commented 7 years ago

I see, thanks. Right - we didn't use the conv net since we were just testing on MNIST, but in general it could be good to use. However, I don't think we tested the conv_net code so it might need some minor changes