mdenil / dropout

A theano implementation of Hinton's dropout.
MIT License
144 stars 58 forks source link

Why set the W by this formula W=layer.W / (1 - dropout_rates[layer_counter]) in testing? #16

Closed BayronP closed 6 years ago

BayronP commented 6 years ago

Hi, Thanks for this code. My Question about the code is similar to @droid666 ’s question, why set the W by this formula W=layer.W / (1 - dropout_rates[layer_counter]) in testing but not W = layer.W ?

mdenil commented 6 years ago

The weights are scaled in the non-dropout MLP because in the original paper (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) it says

If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.

Note that the code scales the weights with a multiplication, not a division (https://github.com/mdenil/dropout/blob/master/mlp.py#L130), and that p in the code is the probability of dropping a unit, so 1-p is the probability that it is retained.