yaringal / ConcreteDropout

Code for Concrete Dropout as presented in https://arxiv.org/abs/1705.07832
MIT License
245 stars 68 forks source link

Biases in weight_regularizer? #15

Open joemrt opened 3 years ago

joemrt commented 3 years ago

First of all, great work, In your thesis, the "Dropout as a Bayesian Approximation..." and "Concrete Dropout" article, @yaringal, you seem to apply the Dropout distribution only to the weights and not the biases, which then leads to a p-dependant regularization term that only includes the weight matrices.

However, in the pytorch implementation (I didn't check the other ones) of the regularization term you sum the squares of layer.parameters() which will collect the biases as well. This will lead to a p-dependant regularization term for the biases, which is probably not what you want if you start optimizing p. Is this a bug or am I missing something?