Open ttgump opened 2 years ago
Regarding line 925: note that recon_loss
is equivalent up to a constant to the log-likelihood part of the ELBO (since we assume Gaussian likelihood with constant variance). Minus in -recon_loss
is hence there, because of the -
in the exponent of the Gaussian pdf. To be fully mathematically exact, one would need to have -recon_loss / (2 * sigma**2)
, however since we used beta
in front on the KL-term, exact scalar is not really needed in front of the recon_loss term. A correct sign (-
) is needed, though.
Regarding line 913: would need to check Taming VAE paper again to be able to give a conclusive answer here. Another way to verify this is to run e.g. MNIST experiment, once with KL_term
in line 913 and once with -KL_term
, see README on instructions how to run experiments directly from the terminal. Make sure to include --GECO
flag.
Thanks for the explanation. I have another question about the SVGP-VAE model. In the supplementary C.4 section, you did the experiment of the impact of different number of inducing points. The results and theoretical analysis show that large number of inducing point will result to a poor performance. If I have a large dataset with large number of dependency information, e.g. time points or types of data, it needs more inducing points to capture the complex dependency matrix (covariance matrix). Is there any suggestions in this situation? Will large batch size help to fix the poor performance of large number of inducing points? Or should I explore the MC estimator in Evans and Nair (2020) as the paper mentioned? Thanks. Best, Tian
Hi, In the code file "SVGPVAE_model.py", should the line 913 be like
elbo = KL_term + lagrange_mult * (recon_loss/b + tf.stop_gradient(C_ma - recon_loss/b))
?Also the line 925, should it be
elbo = recon_loss + (beta / L) * KL_term
?Thanks.