If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time
The test activations should be scaled by (1-drop_prob), not drop_prob.
For example, if drop prob is 0, this layer should have no effect and we should scale activations by 1.
From the dropout paper http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf :
The test activations should be scaled by
(1-drop_prob)
, notdrop_prob
. For example, if drop prob is 0, this layer should have no effect and we should scale activations by 1.