the final loss - Githubissues

Thanks for your questions!

Each SBP layer adds KL divergence to the final loss since the KL divergence depends on the specific values of parameters of the approximate posterior distribution. The final loss (negative ELBO) is evaluated after the forward pass through the entire network in sgvlb function. So the final loss includes cross-entropy term (log-loss). Note that there is also l2 loss there, but this is legacy and it is turned off in scripts for SBP model training.

Informally speaking, you can take ELBO as the objective that consists of two parts: data-term that is log-loss, and KL-term that is kind of a regularization. ELBO = data-term (log-loss) + KL-term

necludov / group-sparsity-sbp

the final loss #4