Closed dgoldman-pdx closed 9 years ago
Hi @dgoldman-ebay , thanks for your interest!
I think the code is correct without bias scaling for the following reasons:
Caffe: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/dropout_layer.cpp#L40 Pylearn2: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/models/mlp.py#L829 and https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/models/mlp.py#L987
(they actually do the inverse scaling at train time but it's still equivalent).
@mdenil you may indeed be right.
Looking at a newer Hinton dropout summary paper (http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf), he still uses the phrase "outgoing weights". But Figure 2 helps explain this. Rather than decrease the outgoing activation value of each node (equivalent to decreasing both W and b for the node), he instead decreases the weight of each outgoing connection between the node and the nodes of the next layer.
As I read your code, it looks to me like you're instead decreasing the weight of each incoming connection from the previous layer. Have I misread?
The dropout layer takes the dropout rate to be applied to its output (dropout_rate[layer_counter+1]
), but the scaling applied to W uses dropout_rate[layer_couter]
which corresponds to the dropout rate applied to the output of the previous layer.
So yes, you are correct that the code scales each incoming connection from the previous layer, but it scales them by using the dropout rate that was applied the input of that layer.
This is kind of confusing because the code bundles weights with the activations above them, and Hinton's papers talk about weights and the activations below them, but I think they agree once we unpack the indexing.
Thanks, @mdenil. That all makes sense. I was misreading your code.
So I'll just take this opportunity to thank you for your work here ... and now I'll move along... :)
In the "mean network", each unit's output must be decreased to compensate. That implies that both W and b, rather than only W, must be decreased -- right?