Closed josifgrabocka closed 5 months ago
The KL divergence between two gaussians "KL(N(mu, sigma), N(0,1)) = -log \sigma + 1/2 sigma^2 + 1/2 mu^2" instead of 1/2 (-log sigma^2 + sigma^2 + mu^2)
We can double-check that "1 = argmin_sigma -log \sigma + 1/2 sigma^2" and "0 = \argmin_mu mu^2", hence N(mu=0, sigma=1)=N(0,1) leading to KL=0.
Equation 5.78 is correct and since 10.50 follows 5.78, then 10.50 can be corrected.
I think these are equivalent, since 0.5 log (sigma^2) = log (sigma)?
oh, it's the +1 term that differs. Fixed.
The KL divergence between two gaussians "KL(N(mu, sigma), N(0,1)) = -log \sigma + 1/2 sigma^2 + 1/2 mu^2" instead of 1/2 (-log sigma^2 + sigma^2 + mu^2)
We can double-check that "1 = argmin_sigma -log \sigma + 1/2 sigma^2" and "0 = \argmin_mu mu^2", hence N(mu=0, sigma=1)=N(0,1) leading to KL=0.
Equation 5.78 is correct and since 10.50 follows 5.78, then 10.50 can be corrected.