Closed kanul closed 3 years ago
Hi, maybe we can first start with the autoencoder. Autoencoder encodes images to feature points and then use decoder for reconstruction. However, the latent feature may have "holes" and not every point sampled in the latent space can be decoded to a signal.
Therefore, variational autoencoder actually introduces some disturbance to the latent space. That is, the encoded features are disturbed by adding Gaussian noise, and the decoder is trained so that images can still be reconstructed even with features within epsilon of the input feature point. When trained in this manner, the network can learn to produce a continuous latent space, rather than the space that contains holes. This is an intuitive explanation. Maybe you can refer to https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73 for further understanding.
Thank you for your reply.
I have been confused from your comments: "variational autoencoder actually introduces some disturbance to the latent space." VAEs, generally, have to encode a single point as a distribution over the latent space. Adding Gaussian noise to the encoded features can not be guaranteed that input of decoder corresponds to a Gaussian distribution with mean g(x) and with covariance H(x)=h(x).h^t(x) on your reference.
Have you designed your input of mapping or decoder to be represented as z=σ*ζ+μ, ζ~N(0,1)? That is, can you find the re-parameterization trick on your inference code?
def inference(self, label, inst):
input_concat = label.data.cuda()
label_feat = self.netG_A.forward(input_concat, flow="enc")
if self.opt.NL_use_mask:
label_feat_map = self.mapping_net(label_feat.detach(), inst.cuda())
else:
label_feat_map = self.mapping_net(label_feat.detach())
fake_image = self.netG_B.forward(label_feat_map, flow="dec")
return fake_image
Since this is the test code, there is no need to do the re-parametrization trick. We can directly use the mean from the encoder and feed it to the decoder
README.md
I have been struggled to understand a good approach on your paper. Then, I could not find the role of sampling to perform VAE on inference time as below. Also, if you have only used the sampling function to define KL divergence on training step, I can only think that it is hard to assume that the input of G corresponds to a normal distribution. Because the collection of mean and std. does not have the characteristic of Gaussian distribution itself. Can you clarify our mis-understanding or fix missing points? I have no choice but to guess that the input of G is feature map only including mean and std. not samples from Gaussian prior. Right?
On paper "The VAEs assumes Gaussian prior for the distribution of latent codes, so that images can be reconstructed by sampling from the latent space. We use the re-parameterization trick to enable differentiable stochastic sampling"