yaringal / ConcreteDropout

Code for Concrete Dropout as presented in https://arxiv.org/abs/1705.07832
MIT License
245 stars 68 forks source link

dropout_regularizer #3

Open XinDongol opened 6 years ago

XinDongol commented 6 years ago

In the paper, entropy of a Bernoulli random variable is H(p) := -p * log(p) - (1-p) * log(1-p)

But in the code, dropout_regularizer was computed by dropout_regularizer = self.p * K.log(self.p) dropout_regularizer += (1. - self.p) * K.log(1. - self.p) dropout_regularizer *= self.dropout_regularizer * input_dim

Could you please explain the meaning of the dropout_regularizer *= self.dropout_regularizer * input_dim. I cannot find related equation of this code in your paper.

Thanks for your kind help in advance.

joeyearsley commented 6 years ago

I believe this is for scaling the lambda by the output shape, before applying it.

However I think @yaringal is the only one who can properly answer this.

Whilst re-reading the code I noticed the comment above input_dim mentions ignoring the final dimension yet slices to ignore the first dim? - is the comment or code wrong?

yaringal commented 6 years ago

Normal dropout upscales the feature vector by 1/(1-p) after dropping out units. We do the same, substituting W'=W/(1-p) into the model and KL calculations. Y

Edit (2019): see lines

kernel_regularizer = self.weight_regularizer * tf.reduce_sum(tf.square(weight)) / (1. - self.p)

and

retain_prob = 1. - self.p
x /= retain_prob