Why set the W by this formula W=layer.W / (1 - dropout_rates[layer_counter]) in testing?

The weights are scaled in the non-dropout MLP because in the original paper (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) it says

If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.

Note that the code scales the weights with a multiplication, not a division (https://github.com/mdenil/dropout/blob/master/mlp.py#L130), and that p in the code is the probability of dropping a unit, so 1-p is the probability that it is retained.

mdenil / dropout

Why set the W by this formula W=layer.W / (1 - dropout_rates[layer_counter]) in testing? #16