ratschlab / SVGP-VAE

Tensorflow implementation for the SVGP-VAE model.
MIT License
19 stars 4 forks source link

Typos in the code #1

Open ttgump opened 2 years ago

ttgump commented 2 years ago

Hi, In the code file "SVGPVAE_model.py", should the line 913 be like elbo = KL_term + lagrange_mult * (recon_loss/b + tf.stop_gradient(C_ma - recon_loss/b))?

Also the line 925, should it be elbo = recon_loss + (beta / L) * KL_term?

Thanks.

metodmove commented 2 years ago

Regarding line 925: note that recon_loss is equivalent up to a constant to the log-likelihood part of the ELBO (since we assume Gaussian likelihood with constant variance). Minus in -recon_loss is hence there, because of the - in the exponent of the Gaussian pdf. To be fully mathematically exact, one would need to have -recon_loss / (2 * sigma**2), however since we used beta in front on the KL-term, exact scalar is not really needed in front of the recon_loss term. A correct sign (-) is needed, though.

Regarding line 913: would need to check Taming VAE paper again to be able to give a conclusive answer here. Another way to verify this is to run e.g. MNIST experiment, once with KL_term in line 913 and once with -KL_term, see README on instructions how to run experiments directly from the terminal. Make sure to include --GECO flag.

ttgump commented 2 years ago

Thanks for the explanation. I have another question about the SVGP-VAE model. In the supplementary C.4 section, you did the experiment of the impact of different number of inducing points. The results and theoretical analysis show that large number of inducing point will result to a poor performance. If I have a large dataset with large number of dependency information, e.g. time points or types of data, it needs more inducing points to capture the complex dependency matrix (covariance matrix). Is there any suggestions in this situation? Will large batch size help to fix the poor performance of large number of inducing points? Or should I explore the MC estimator in Evans and Nair (2020) as the paper mentioned? Thanks. Best, Tian