piEsposito / blitz-bayesian-deep-learning

A simple and extensible library to create Bayesian Neural Network layers on PyTorch.
GNU General Public License v3.0
927 stars 105 forks source link

Monte Carlo Sampling #10

Closed danielkelshaw closed 4 years ago

danielkelshaw commented 4 years ago

In the code below you set the loss to be equal to criterion(outputs, labels) on each iteration of the loop - as such this I don't believe this is increasing the value of the loss for each sample?

https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/c2482f2a30a343c65f0fbc65d915a0951458c88f/blitz/utils/variational_estimator.py#L60-L65

In addition, you normalise the entire loss at the end, might it be more appropriate to normalise just the sampled losses?

Finally, in the paper, the authors subtract the data-dependant part from the KL Divergence whereas in your code you seem to summate them, would it be correct to change the sign here - or would it depend entirely on the criterion defined?

The changes I'm proposing are summarised in the code below:

loss = 0 
for _ in range(sample_nbr): 
    outputs = self(inputs)
    loss += self.nn_kl_divergence() * complexity_cost_weight 
loss /= sample_nbr
return loss - criterion(outputs, labels)
piEsposito commented 4 years ago

I've ran the tests and examples and they were not passing. It occurs that the loss is to be summed at the end, as the self.nn_kl_divergence() return the KL Divergence as it is able to be summed to the NLL or MSE.

At the end we cannot use the criterion only once, as the variational inference would give us difference value for each iteration, so it must be added at each different iteration.

I've already fixed that. Anyway, thank you for engaging the project.

danielkelshaw commented 4 years ago

Apologies for the tests not passing!

In the current method: https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/566eaa495f28fe9a8c2b076a7388548aa4f792f0/blitz/utils/variational_estimator.py#L60-L65

Each time you run through the for loop with a new sample then you are resetting the value to be equal to the criterion and then adding on the kl_divergence - this means that the error doesn't accumulate for each sample I believe. I think if you want to use this approach you may have to change:

https://github.com/piEsposito/blitz-bayesian-deep-learning/blob/566eaa495f28fe9a8c2b076a7388548aa4f792f0/blitz/utils/variational_estimator.py#L63

to `loss += criterion(outputs, labels)

Just a small point!