jxhe / vae-lagging-encoder

PyTorch implementation of "Lagging Inference Networks and Posterior Collapse in Variational Autoencoders" (ICLR 2019)
MIT License
184 stars 34 forks source link

Is the reconstruction procedure indispensable for latent modes? #6

Open zjcerwin opened 5 years ago

zjcerwin commented 5 years ago

Hi, thanks for your code, it helps me a lot.

I have been learning latent variable models within weeks and feel puzzled about this field. As reconstruction loss (or decoder) is expensive to compute and prone to collapse, I`m wondering is the reconstruction procedure indispensable for a latent mode to capture useful information?

For example, in a supervised multi-task setting, if I want latent space can capture domain-specific signal, how can I towards this end by just using classification label and domain label but reconstruction loss, are there any relevant literatures ?

I am stuck in this question, hope you can direct me out.

jxhe commented 5 years ago

Hi,

In supervised setting reconstruction is usually not used to learn useful code since latent code has to capture data information to make correct prediction, but that's not VAE anymore.

Learning latent code is NOT the only goal of VAE -- firstly VAE is a generative model to model data distribution. Decoder and prior forms the VAE model, and the encoder is only proposed to help learn the generative model, but encoder is NOT part of the generative model. If there is no decoder, then there is no generative model -- VAE doesn't exist anymore.

If you only want to learn some representations, certainly the decoder is not required, as you said you can do classification task, but those approaches don't model data distribution p(x) and they are not generative models.

Decoder is indispensable for VAE since decoder itself (along with the prior) is the VAE model, it's the most important part.

zjcerwin commented 5 years ago

Thanks for your quick reply.

So, the point of VAE is to learn the latent distribution of observed data which is then used to generate new data.

As far as I know, VAE uses variational inference to maximize ELBO, in which p(x|z) is the reconstruction term (also called decoder).

What I`m looking for is to find a way to regularize posterior latent code p(z|x) with some prior knowledge y (e.g. labels, knowledge base, etc), instead of just modeling p(x).

I`m still not clear that if decoder is absent, does the latent space could make sense by directly model p(z|x, y). Is it correct to formalize ELBO = E( log P(y|z, x) ) - KL( q(z|x, y) || p(z|x) )

jxhe commented 5 years ago

What you wrote as ELBO is a conditional VAE to model p(y | x) instead of p(x). Let me ask you two questions, how did you sample x from the model without a decoder ? How did you approximate log p(x) given x ?

zjcerwin commented 5 years ago

CVAE is probably most closed to my thought. For the first question: My goal is to improve the performance of supervised tasks with the help of regularized latent code, therefore, sampling new x is relatively less important. p(y|x) is considered a discriminative network instead of a generative decoder.

For the second: Conditional log p(y|x) is expanded as log \sum_z p(y|z, x), the ELBO thus derived as ELBO = E( log P(y|z, x) ) - KL( q(z|x, y) || p(z|x) ), in which reconstruction of x, p(x|z) is absent.

I`m not sure this deduction makes sense, thanks again for your patience.