Open wzn0828 opened 1 year ago
Hello! Empirically, we found that putting no constraint on the first distribution (e.g. making it match a standard normal distribution) slightly improves results. That's why there is no loss on the first posterior distribution.
In the mile.losses.ProbabilisticLoss, I know the first posterior distribution should close to the N(0,I), but the code may be confusing for me. Like the screenshotthe posterior_log_sigma has been cut off the first element, why is the first element still selected in the line of first_kl ?