Closed mohitsharma29 closed 3 years ago
Hey @mohitsharma29! We use a Gaussian mixture with equal weight, where each of the components is obtained with an independent run of SWAG. In other words, you can train several SWAG models, and then ensemble the predictions of each of them.
Oh okay. Have you tried tuning the mixture weights and if yes how did it compare to the ensemble situation?
We didn't try to tune the weights. My expectation would be that you can maybe get a tiny improvement by tuning the weights, but I don't think you'd see a big improvement
Hey I understand this. However, if I were to create a mixture of these models, what would be the best way? This problem is a bit non-trivial in the way that I already have my Gaussian distributions, so now I need to create a mixture.
I think the simplest way is to just sample n
networks from each of the Gaussians (SWAG models), and compute the predictions. Then, you should ensemble the models (average the predictions) off all of the samples, across all Gaussians.
There is this example script which trains and evaluates a multiSWAG model https://github.com/izmailovpavel/understandingbdl/blob/master/experiments/train/run_multiswag.sh
Hey
Thank You for releasing the code. Your paper mentions that you combine multiple SWAG distributions using a gaussian mixture approximation. I wanted to know which script/section of code does that part and estimates the weights for each gaussian component.