Hi, would you tell me why the kld calculation only involves what it seems to be the encoder or approximated (to the posterior) function? Isn't it calculated base on 2 distributions?
mu = self.context_to_mu(context)
logvar = self.context_to_logvar(context) # to z sampled from
std = t.exp(0.5 * logvar)
z = Variable(t.randn([batch_size, self.params.latent_variable_size]))
if use_cuda:
z = z.cuda()
z = z * std + mu
kld = (-0.5 * t.sum(logvar - t.pow(mu, 2) - t.exp(logvar) + 1, 1)).mean().squeeze()
Hi, would you tell me why the kld calculation only involves what it seems to be the encoder or approximated (to the posterior) function? Isn't it calculated base on 2 distributions?