Bug in VI regression example?

doldd commented 3 years ago

Hi,

I have a questions about your implementation of the vi regression notebook (experiments/deep_ensembles/1d regression_svi.ipynb) Might it be that there is a bug in your VI regression notebook with the gradient computation for the VI parameters? In my investigations I saw that the VI parameter \mu does not change if I compute only the NLL loss term instead of the complete ELBO which includes the KL-term. However, in my opinion even the NLL term alone should update the \mu parameter. After some more investigations I think this could be because of your VI architecture, which broke the gradient flow trough the VI parameters. The image should confirm this behavior. Because of this, the VI model is only optimized based on the KL term of the ELBO. Is this behavior known? If yes, is this the reason why you first train the model normally and after that copy the mu parameters and retrain the model in a VI fashion?

izmailovpavel commented 3 years ago

Hey @doldd! I agree with you and I have actually noticed this bug earlier and fixed the experiment, but I forgot to push it to the public repo. I have now pushed the updated VI notebook here: https://github.com/izmailovpavel/understandingbdl/blob/master/experiments/deep_ensembles/1d%20regression_svi.ipynb. There' I re-implement the linear layer for the VI manually to ensure that the gradients are computed correctly. Please let me know if you find issues with the updated notebook :)

izmailovpavel commented 3 years ago

Btw, I have also updated the notebook for HMC to use better hyper-parameters than what I did originally. I pushed the updated notebook here: https://github.com/izmailovpavel/understandingbdl/blob/master/experiments/deep_ensembles/1d%20regression_hmc.ipynb.

izmailovpavel / understandingbdl

Bug in VI regression example? #8