Is it necessary to use a separate calibration set for Bayesian neural networks?

pasq-cat commented 1 year ago

Hi Max, I am a physics student trying to use a Bayesian neural network to fit a dataset. I am not an expert in the field of deep learning so forgive me if this question sounds stupid. I was trying to use your technique but i don't have much data and I was thinking of using the same training set to get the scaling factor instead of a separate calibration set. I read in Levi's article that the latter option is suggested but he uses a non-Bayesian probabilistic network with determined weights and perhaps feared overfitting. Since you have used it with a Bayesian network, I wanted to ask you if you think it is really necessary to use a separate set or is it possible to use the same training set. Thank you and sorry for the trouble and for the wall-of-text.

mlaves commented 1 year ago

Hi Pasquale, nice to see that you use our methods! Well, if a network overfits the training data and becomes overconfident on unseen test data, recalibration on the training data won't do much (on the overfitted training data, the network is already well-calibrated). You can certainly try it but if you realize that this will worsen your calibration on the test set and switch back to σ = 1, you basically optimize σ on the test set (which you should not do). Maybe you can train your network in a cross-validation manner and optimize the scaling factor on the hold-out set and compute the final value by averaging over all hold-out sets. After that, you train your final model on all folds (without CV) and use the scaling factor you obtained from CV.

pasq-cat commented 1 year ago

Thank you for answering, I've been meaning to ask for a while but I was afraid I wouldn't get any answer. Since the bnn showed no overfit in the loss plot, I tried using the training dataset for calibration and then i evaluated the level of miscalibration on the test dataset (I don't have the time to use cross-validation but I think/hope/pray_god it's too much for the project requirements) . the miscalibration plot goes from this to
.

so i guess it works.

can I ask you one last question? I tried searching over internet but the topic of calibration of the predictive uncertainty in the context of regression is much less discussed than the classification counterpart. I need to compare the quality of the results given by the BNN with the results given by a physics-informed statistical model whose parameters are fitted through a MCMC ensembler using the same log-likelihood used for the bnn (product of gaussians). Do you think that this sigma-scaling technique you used for the bnn can also be used for this other model? i think there should be no problem but i didn't find any similar example on the internet.

mlaves commented 1 year ago

Hey, no worries! It's always nice to see that my work helped someone. Looks like your approach worked very well! In my experiments, calibrating on the test set never really worked out. But maybe my networks showed more overfitting.

Regarding your second question: There's very little work on regression uncertainty in general, and calibration in particular. That's why we wrote the paper. I don't see why re-calibration of MCMC uncertainty shouldn't work in practice or should be invalid in some sense. Give it a try. However, I would expect that the MCMC approach is much less miscalibrated, because one major source of miscalibration is the fact that the approximate posterior from variational inference is much sharper (overconfident) than the true posterior that MCMC yields. See last figure in my gist: https://gist.github.com/mlaves/607d5252325d44fcea02d42179811d2e

pasq-cat commented 1 year ago

in my case strangely the neural network looks underconfident. i have tried different ways to do bayesian inference : deep ensemble, variational inference with tensorflow probability and through concrete dropout layers. In the end, maybe due to the simplicity of my training dataset (just 1550 tuples (x,y)), there were not much different in terms of quality of the results. On the contrary in terms of time and computational resources, concrete_dropout was the fastest choice.

However, I would expect that the MCMC approach is much less miscalibrated

i know and that's what the plots shows, although once i use sigma-scaling they both reach the same level. Sadly when i accepted the task i didn't fully know the theory behind bnns...

pasq-cat commented 1 year ago

Hi @mlaves , i just wanted to thank you for the help. everything went well. in the end i got my degree and everything went well. Thank you again

mlaves / well-calibrated-regression-uncertainty

Is it necessary to use a separate calibration set for Bayesian neural networks? #2