Closed BayronP closed 6 years ago
The weights are scaled in the non-dropout MLP because in the original paper (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) it says
If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.
Note that the code scales the weights with a multiplication, not a division (https://github.com/mdenil/dropout/blob/master/mlp.py#L130), and that p
in the code is the probability of dropping a unit, so 1-p
is the probability that it is retained.
Hi, Thanks for this code. My Question about the code is similar to @droid666 ’s question, why set the W by this formula W=layer.W / (1 - dropout_rates[layer_counter]) in testing but not W = layer.W ?