Masksembles - a couple of questions

dkoguciuk commented 3 years ago

Hi @nikitadurasov ,

I have a couple of questions about Masksembles:

What is the difference between Masksembles and Batchensemble? Is there any particular reason you do not discuss it in the paper?
As far as I understand the general idea of Masksembles and Batchensemble is pretty similar: they don't have this scale property to move between dropout and naive ensemble, however, their mask is learnable, which brings the question, if those two ideas could be combined?
I've been developing a somewhat similar approach. I was feeding the same data to all the modes (in contrast to your approach) and was forcing diversity between modes' predictions by maximizing the L1 difference of predictions. Have you tried enforcing diversity anyhow?

Best, Daniel

nikitadurasov commented 3 years ago

Hey @dkoguciuk,

So,

1) Basically we've used their implementation idea (since it's very convenient) but our major contribution was to develop a method that would allow changing correlation between submodels (+ providing Ensembles / MC-Dropout transition). In BatchEnsemble there is no such thing though.

2) I think the idea of combining the approaches is an interesting one. I've tried to make our masks learnable too (so every value in masks is in [0, 1]) but this way we're losing control over correlation of generated submodels (for example, our model could decide to make all of the masks the same or very similar). I have an idea how to combine these two though, would be happy to share :)

3) In general, our correlation parameter does exactly this: by reducing the correlation parameter you increase the diversity of predictions. I doubt that we've tried to enforce diversity by incorporating some properties into loss though.

Best, Nikita

ZhouCX117 commented 3 years ago

@nikitadurasov Hi, could you please tell me how masksembles change the correlation between submodels? From the paper and code, I don't understand this. Does the drop features that are not used in any mask help this?

nikitadurasov commented 2 years ago

Hey @ToBeNormal, one of the general properties of Masksembles approach is the "correlation" of its submodels. Each submodel is represented by according binary mask in Masksembles layers. The less ones do binary masks share -- the less correlated are their predictions. You can check last section of supplementary material to find more information on that.

nikitadurasov / masksembles

Masksembles - a couple of questions #6