KL Divergence for Independent

For parametrizing an axis aligned Gaussian you are using a Normal wrapped into an Indepedent, and add a patch for the undefined KL diveregence.

I was wondering isn't it possible to achieve the same (axis aligned multivariate gaussian) using a MultivariateNormal instance? For example:

mu = torch.zeros(batch_size, latent_dim)
log_sigma = torch.ones(batch_size, latent_dim)
cov = torch.stack([torch.diag(sigma) for sigma in torch.exp(log_sigma)])

mvn = MultivariateNormal(mu, cov)

mvn.batch_shape, mvn.event_shape (torch.Size([batch_size]), torch.Size([latent_dim]))

considering KL is defined for a (MultivariateNormal, MultivariateNormal)

stefanknegt / Probabilistic-Unet-Pytorch

KL Divergence for Independent #13