Closed danielkelshaw closed 4 years ago
I've ran the tests and examples and they were not passing. It occurs that the loss is to be summed at the end, as the self.nn_kl_divergence()
return the KL Divergence as it is able to be summed to the NLL or MSE.
At the end we cannot use the criterion only once, as the variational inference would give us difference value for each iteration, so it must be added at each different iteration.
I've already fixed that. Anyway, thank you for engaging the project.
Apologies for the tests not passing!
In the current method: https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/566eaa495f28fe9a8c2b076a7388548aa4f792f0/blitz/utils/variational_estimator.py#L60-L65
Each time you run through the for
loop with a new sample then you are resetting the value to be equal to the criterion
and then adding on the kl_divergence
- this means that the error doesn't accumulate for each sample I believe. I think if you want to use this approach you may have to change:
to `loss += criterion(outputs, labels)
Just a small point!
In the code below you set the loss to be equal to
criterion(outputs, labels)
on each iteration of the loop - as such this I don't believe this is increasing the value of the loss for each sample?https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/c2482f2a30a343c65f0fbc65d915a0951458c88f/blitz/utils/variational_estimator.py#L60-L65
In addition, you normalise the entire loss at the end, might it be more appropriate to normalise just the sampled losses?
Finally, in the paper, the authors subtract the data-dependant part from the
KL Divergence
whereas in your code you seem to summate them, would it be correct to change the sign here - or would it depend entirely on thecriterion
defined?The changes I'm proposing are summarised in the code below: