tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.27k stars 1.1k forks source link

In the implementation of the vae, (the vanilla version), shouldn't the input to the decoder be the z-priors? why do you use the approx_posteriors? #546

Closed MaisaDaoud closed 4 years ago

colemakdvorak commented 5 years ago

I find the question somewhat vague but I will give it a go in answering. First I assume you are talking about this implementation within the repo tensorflow_probability/examples/vae.py.

You are correct in stating that input to the decoders should be the z-priors sampled from posterior q(Z | X). I think you are just confused about the terminology. The encoder itself is referred to as approx_posteriors because the distribution itself is approximated, due to difficulty in computing the exact q(Z|X). I think this is a good reference that explains the general settings of Variational Inference. Section 2.1 illustrates why the problem is intractable, and Section 2.2 gives you a sense of how the problem is an optimization/approximation.

Anyways,

approx_posterior = encoder(features)
approx_posterior_sample = approx_posterior.sample(params["n_samples"])
decoder_likelihood = decoder(approx_posterior_sample)

suggests that approx_posterior is yielding samples, which are z-priors that you are talking about. So the implementation is in agreement with your understanding of VAE.

MaisaDaoud commented 5 years ago

Thank you for your answer! but based on this implementation, what is the role of the z-priors? I guess the implementation should be like:

approx_posterior = encoder(features) approx_posterior_sample = approx_posterior.sample(params["n_samples"]) code = make_prior() decoder_likelihood = decoder(code) So the Kl-Divergence function will minimize the difference been the code (P(z) ) and the posterior Q(z|x) samples and, by this way, the decoder will be able to generate new samples from the random distribution defined in the make_prior method. Sorry for the inconvenience, but reading and trying different implementations made it a bit confusing

colemakdvorak commented 5 years ago

No worries, even reading about the same thing in different conventions can be a hassle, and things getting weirder as you throw multiple implementations... is just normal. I suggest taking a look at the following function from the beginning,

https://github.com/tensorflow/probability/blob/c4ee9efca41edee203e9e547b7d5eb2d969e0595/tensorflow_probability/examples/vae.py#L325

It deals with latent prior that you seem to be expecting but not finding. I don't know what you are referring to as other implementations. But taking time to check that out and resolving it with some other implementation/presentation in your hand will probably demystify things.

Edit: I guess other common implementation makes the transformation on reparameterization trick explicit, whereas the implementation in TFP doesn't make this as "explicit"?

MaisaDaoud commented 5 years ago

other implementation is this one for example : https://danijar.com/building-variational-auto-encoders-in-tensorflow/. There are many others as well.

Anyways, thank you very much for your time and answers

colemakdvorak commented 5 years ago

The example implementation is similar to what's implemented in this repo (i.e. what you mentioned as code in the link is really what approx_posterior.sample(params["n_samples"]) is doing). Sorry if I ended up confusing you more, but I am sure you will able to resolve it.

MaisaDaoud commented 5 years ago

Ok thanks, I will have another look! thanks for your time