izmailovpavel / understandingbdl

Other
229 stars 38 forks source link

Gaussian Mixture Models in MultiSWAG #6

Closed mohitsharma29 closed 3 years ago

mohitsharma29 commented 3 years ago

Hey

Thank You for releasing the code. Your paper mentions that you combine multiple SWAG distributions using a gaussian mixture approximation. I wanted to know which script/section of code does that part and estimates the weights for each gaussian component.

izmailovpavel commented 3 years ago

Hey @mohitsharma29! We use a Gaussian mixture with equal weight, where each of the components is obtained with an independent run of SWAG. In other words, you can train several SWAG models, and then ensemble the predictions of each of them.

mohitsharma29 commented 3 years ago

Oh okay. Have you tried tuning the mixture weights and if yes how did it compare to the ensemble situation?

izmailovpavel commented 3 years ago

We didn't try to tune the weights. My expectation would be that you can maybe get a tiny improvement by tuning the weights, but I don't think you'd see a big improvement

mohitsharma29 commented 3 years ago

Hey I understand this. However, if I were to create a mixture of these models, what would be the best way? This problem is a bit non-trivial in the way that I already have my Gaussian distributions, so now I need to create a mixture.

izmailovpavel commented 3 years ago

I think the simplest way is to just sample n networks from each of the Gaussians (SWAG models), and compute the predictions. Then, you should ensemble the models (average the predictions) off all of the samples, across all Gaussians.

There is this example script which trains and evaluates a multiSWAG model https://github.com/izmailovpavel/understandingbdl/blob/master/experiments/train/run_multiswag.sh