reproduce vae sample results

shami-nisimov commented 7 years ago

in the documentation there is a figure showing the ELBO for the test dataset. after 50 epochs, the ELBO reaches below -80.

when running the vae sample, after 200 epochs the ELBO reaches -100.

is it a bug in the sample code ?

karalets commented 7 years ago

Thanks for commenting. We ran exactly the sample code to produce results and were aiming for complete transparency. The elbo is a bit inflated because we did not binarize images across all our examples, but the discrepancy you describe is too far from acceptable. I will investigate, this should be 100% replicable.

On Nov 9, 2017 1:40 AM, "shami-nisimov" notifications@github.com wrote:

in the documentation there is a figure showing the ELBO for the test dataset. after 50 epochs, the ELBO reaches below -80.

when running the vae sample, after 200 epochs the ELBO reaches -100.

is it a bug in the sample code ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uber/pyro/issues/539, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVhL5alYSsiIq1BKk2CfoN-1IfHHn3rks5s0siagaJpZM4QXrcj .

karalets commented 7 years ago

Small update: I ran our code a few times to test what happens and reconstructed your observation. Interestingly, that model code was not changed since the plots were made for the tutorial, but the numbers you observed and I verified upon running it again a few times are consistent with the original VAE paper by Durk Kingma and Max Welling (ICLR 2013).

I will search through the many moving parts of pyro, the plotting scripts and my own system installation that may have changed since release to find what caused this discrepancy and once I have some more clarity can also update the plots in the tutorial to reflect the correct numbers. I currently do not see any bug in the example code for vae, I think this is all fine, so I am curious to find out what happened there.

Thank you again for catching this, the intention with all our models is to be 100% reproducible and match the published state of the art and comments such as yours help us keep them honest.

neerajprad commented 7 years ago

We can replicate this independently with the vae_comparison.py example. Bisecting the commits, the results for commit - d2252b241e63e3f79716aadeeb8361e7855ca547.

Running Pyro VAE implementation
====> Epoch: 0 
Training loss: 0.0016
Test set loss: 0.0013
====> Epoch: 1 
Training loss: 0.0012
Test set loss: 0.0012
====> Epoch: 2 
Training loss: 0.0011
Test set loss: 0.0011
====> Epoch: 3 
Training loss: 0.0011
Test set loss: 0.0011
====> Epoch: 4 
Training loss: 0.0011
Test set loss: 0.0011
====> Epoch: 5 
Training loss: 0.0011
Test set loss: 0.0011

and for the next commit - 6f01edb5743a393b47deeca879545bab60366f27 are quite different, and repeatable:

Running Pyro VAE implementation
====> Epoch: 0 
Training loss: 0.0015
Test set loss: 0.0011
====> Epoch: 1 
Training loss: 0.0010
Test set loss: 0.0010
====> Epoch: 2 
Training loss: 0.0010
Test set loss: 0.0009
====> Epoch: 3 
Training loss: 0.0009
Test set loss: 0.0009
====> Epoch: 4 
Training loss: 0.0009
Test set loss: 0.0009
====> Epoch: 5 
Training loss: 0.0009
Test set loss: 0.0009

I am still not sure why the changes in #494 would result in this difference, and I haven't debugged any further.

karalets commented 7 years ago

OK, this is actually it. Thank you @neerajprad . In short: the vae example did not change, but our Bernoulli distribution used in this example changed in that PR post-release, which explains the difference.

We can now interpret the core of this issue as looking into why the old Bernoulli behaved so awkwardly compared to the new one. My guess is the new one is improved and the previous one buggy, but we will look at our tests more to validate exactly.

neerajprad commented 7 years ago

I can confirm that our current implementation is correct, or at least consistent with other libraries like tensorflow. Both would return the same result for discrete 0, 1 values, but different when data is continuous. Our prior calculation (while it worked on continuous values), would actually calculate the following (assume only a single datum):

 log_pdf(x) = log(p*x + (1-p) * (1-x))

This does not blow up at 0 or 1, and I suppose that was the reason why we had this. The fact that it did not error for continuous values was accidental.

The current implementation calculates binary cross entropy, and is consistent with the tf implementation:

log_pdf(x) = log (p^x * (1-p)^(1-x)) = x log p + (1-x) log (1-p)

neerajprad commented 7 years ago

I reopened this, as we should update some of our tutorials. We can track that as part of this task.

ngoodman commented 6 years ago

@neerajprad @eb8680 are tutorial updates still pending?

karalets commented 6 years ago

I reran this and have new plots with the updated likelihood, will push this weekend.

On Dec 16, 2017 2:36 PM, "ngoodman" notifications@github.com wrote:

@neerajprad https://github.com/neerajprad @eb8680 https://github.com/eb8680 are tutorial updates still pending?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/uber/pyro/issues/539#issuecomment-352216769, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVhL9xvCMNli_axWTp45t-c7LZn1NXaks5tBEXmgaJpZM4QXrcj .

jpchen commented 6 years ago

@karalets please add me to the PR so i know what to push to the site

martinjankowiak commented 6 years ago

@karalets did you update the plots?

eb8680 commented 6 years ago

@jpchen what's the status of this after #930?

jpchen commented 6 years ago

@neerajprad has been working on vae, and if im not mistaken the plots and numbers in the tutorials have been freshly generated #1050 (comment). feel free to reopen if i am mistaken

pyro-ppl / pyro

reproduce vae sample results #539