lmjohns3 / theanets

Neural network toolkit for Python
http://theanets.rtfd.org
MIT License
328 stars 74 forks source link

Fix bug in BernoulliDropout #123

Closed kaikaun closed 8 years ago

kaikaun commented 8 years ago

Divide activations by (1-weight) rather than weight

lmjohns3 commented 8 years ago

Can you explain why this is the correct scaling factor instead of just weight?

kaikaun commented 8 years ago

Consider the case where weight = 0.1. This means that 0.9 of the activations remain, while 0.1 are zeroed out. In order for the total activation to be on average the same as before, the remaining activations are scaled up. The current code would scale them up by 10 times (i.e. 1 / weight), which is wrong, since 0.9 of the activations remain -- the total activation would now be on average 9 times the original. The correct scaling is by 1 / 0.9, so you should be dividing by (1 - weight), not weight.

In addition, I have been testing with the new (1 - weight) scaling in my own application using my own fork, and I get much better performance both in and out of sample.

Regards, David Khoo

On 20 March 2016 at 12:46, Leif Johnson notifications@github.com wrote:

Can you explain why this is the correct scaling factor instead of just weight?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/lmjohns3/theanets/pull/123#issuecomment-198846229

lmjohns3 commented 8 years ago

Ok, fantastic, thanks for the explanation!