Closed kakusikun closed 6 years ago
Thanks for your questions!
Each SBP layer adds KL divergence to the final loss since the KL divergence depends on the specific values of parameters of the approximate posterior distribution. The final loss (negative ELBO) is evaluated after the forward pass through the entire network in sgvlb function. So the final loss includes cross-entropy term (log-loss). Note that there is also l2 loss there, but this is legacy and it is turned off in scripts for SBP model training.
Informally speaking, you can take ELBO as the objective that consists of two parts: data-term that is log-loss, and KL-term that is kind of a regularization. ELBO = data-term (log-loss) + KL-term
In paper, the final loss function is presented in equation (12), the estimated expected log-likelihood through SGVB and KL divergence.
It seems that the SBP layer only takes KL divergence into account, why don't we need to deal with the expected log-likelihood term?
Is the log likelihood included as our objective function?