deep ensembles being simple ensembles?

psteinb commented 4 months ago

Hi,

thanks for sharing your code with the paper. I was checking the deep-ensemble experiments (e.g. 1d-regression_deep_ensembles.ipynb). It feels like your implementation diverges from the original Deep Ensemble paper considerably. Any reason why that is?

Best, Peter

izmailovpavel commented 4 months ago

Hey @psteinb, thanks for checking out the code. In what way does it diverge from the deep ensembles paper?

psteinb commented 4 months ago

So, there are 3 distinct aspects many people overlook when implementing DeepEnsembles. The paper is very brief in these aspects and hence, I do understand why they are overlooked.

the network to assess UQ for is designed with 2 output heads: one for the mean prediction, one for the predicted variance (see section 2.2.1, "Following [47], we use a network that outputs two values in the final layer, corresponding to the predicted mean μ(𝐱) and variance2 σ2(𝐱)>0.") From what I can tell, the models in your notebooks yield a single head.
the loss function (equation (1)) honors this by having 2 terms, a scaled MSE and a pure variance term; from what I can tell, the notebook mentioned above lists a loss function that contains a MSE term and another which takes into account the prior scaled by a fixed/constant variance
prediction by the ensemble is performed in a gaussian mixture model way (see last paragraph of 2.4 of the paper); your notebook computes the ensemble mean/variance of the ensemble predictions

izmailovpavel commented 4 months ago

So this is just a difference in the underlying model that we apply the deep ensembles methodology to. You can use a homoscedastic noise model (single head) or a heteroscedastic model. In our case, we consider a homoscedastic fixed observation noise model, which also gives rise to the loss function that we use. The mean / variance of ensemble predictions is calculate for the purposes of visualization.

So this is a difference in the underlying model that we use in this particular notebook and not in the approximate inference methodology (deep ensembles).

izmailovpavel / understandingbdl

deep ensembles being simple ensembles? #26