junyanz / BicycleGAN

Toward Multimodal Image-to-Image Translation
https://junyanz.github.io/BicycleGAN/
Other
1.49k stars 255 forks source link

Not clear in the difference between the two latent spaces predicted #85

Open arnabsinha99 opened 4 years ago

arnabsinha99 commented 4 years ago

Note that the encoder E here is producing a point estimate for z', whereas the encoder in the previous section was predicting a Gaussian distribution.

I could not clearly understand what is the difference between the two latent z's being predicted. Both are trying to close to a normal distribution so both of them should be giving a point estimate as an output. @junyanz Sir, kindly correct me if I am wrong in my understanding. It would help me understand this wonderful paper even better. Thanks in advance :-)

junyanz commented 4 years ago

There are two choices: (1) predicts a vector z (2) predicts the mean and variance of multi-variate Gaussian, and sample the latent vector using the predicted mean/variance., as done in Eqn 10 of the VAE paper.

arnabsinha99 commented 4 years ago

Thank you for the paper reference. I shall refer to it. Further, I have a few questions.

  1. So the first choice pertains to cVAE-GAN and the second choice pertains to the cLR-GAN right?
  2. Implicitly, both the methods attempting to predict the latent space are trying to fit into a Gaussian distribution, so what is the difference behind both the methods applied, if you could illustrate with a small example?
  3. In cLR-GAN, when does the sampling from the predicted z take place? This is because after predicting the means and variances, it seems that we directly compare the N(0,1) and z with L1 loss.
arnabsinha99 commented 4 years ago

There are two choices: (1) predicts a vector z (2) predicts the mean and variance of multi-variate Gaussian, and sample the latent vector using the predicted mean/variance., as done in Eqn 10 of the VAE paper.

Does this mean that the cLR-GAN is trying to predict only one value of mean and variance for the entire multivariate latent space of |z| dimensions?

junyanz commented 4 years ago

For your question 1 and 3: (1) predict a single vector for cLR-GAN and (2) predict Gaussian mean/variance for cVAE-GAN. In cLR-GAN, we calculate the L1 loss between the sampled z and predicted z.

For your question 2. The difference is that in (1) we try to match a particular vector, while in (2), we try to match the Gaussian distribution.

yangxiufengsia commented 3 years ago

For your question 2. The difference is that in (1) we try to match a particular vector, while in (2), we try to match the Gaussian distribution.

Hi, I also have the same question with @arnabsinha99 you try to use l1 loss to minimize the encoded z (actually is mean) with the normal gaussian z_real. Why optimizing this could help improve the performance? did you compare bicycleGAN to CVAE-GAN with adding same l1 loss (loss between encoded mean with real_z) ? and what is the difference?