vivin / DigitRecognizingNeuralNetwork

A neural network that recognizes digits
21 stars 20 forks source link

Wrong derivative at Linear- and ThresholdActivationStrategy #1

Closed Fokko closed 9 years ago

Fokko commented 9 years ago

The threshold at the LinearActivationStrategy and ThresholdActivationStrategy should be 1 instead of 0. Otherwise the back-propagation of error algorithm will always cancel out the error.

vivin commented 9 years ago

@Fokko You're right; the derivative for LinearActivationStrategy should be 1. But I don't see how the derivative for ThresholdActivationStrategy would be 1? Wouldn't it be 0 at all points except at the threshold, where it is not differentiable?

Fokko commented 9 years ago

Well, at first, you should never use non-differential functions when using the back-propagation of error algorithm. We could state that at the threshold value the derivative is +inf and otherwise 0. Personally I would not use the type of activation, or I would set it to one so it will not cancel out the error. As for the example, where it is used at the input-layer, it is fine and will not cause any troubles.

I found this repository by Stackoverflow, as it is given as an example, but it should of course be correct :)

vivin commented 9 years ago

Right, you don't want to use non-differentiable functions. I think I'll take the ThresholdActivationStrategy out. I think I threw it in there as I was learning about Neural Nets. This was a learning exercise for me and so it is littered with stuff I experimented with as I was learning. I agree that it should be correct. :)

vivin commented 9 years ago

Ah, I remember now. I was using it for the XOR Neural Network, which it seemed to train successfully using this strategy. But it should be easy to change it to use it the logistic function instead, which should also work.

Actually, it wasn't that. I was using it to build AND and OR neural networks, but I wasn't actually training them. I think this was when I was just trying to figure out how they worked.

vivin commented 9 years ago

I've added a comment on the file that says that it shouldn't be used with the backpropagation algorithm.

The threshold stuff works for perceptron networks, which is what the AND and OR networks in the code are, which is why I had it in there (i.e., when I was learning about NN).

Fokko commented 9 years ago

Good, fine by me. The error wil asymptotically decrease to zero, as long as your network is powerful enough of course. The final rounding of the output value should be done outside the network. For the rest, nice work!

vivin commented 9 years ago

Thank you!