Closed Moadab-AI closed 10 years ago
I don't believe the derivative of the sigmoid is equivalent to the derivative of the softmax. Have you performed a gradient check using softmax output?
well I did do the sort of trivial hand calculation and ended up with the exact result as sigmoid's of "an (1 - an)" :
also when i tried to look it up for confirmation i got the same answer like in e.g. http://www.cs.bham.ac.uk/~jxb/INC/l7.pdf page 13
but above all even if all this is wrong, by no mean I can see the derivative to be "1" ...as it is in the code ..can it ?
and no i didn't do the gradient check as the math of it didnt sound right in first place.
I'm not a user of DeepLearnToolbox, but I'm interested ;-) The derivation of cross-entropy loss and softmax activation is coupled in the way that you can achieve better numerical stability if you compute the delta at output layer taking into account both functions. In this way, several NN tools implement the cross-entropy loss derivative to be the combination of cross-entropy+softmax derivative, and therefore the delta computation at softmax layer is reduced to a linear transformation. You can see more in the following paper:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.6403
Thank you for the derivation, mabdollahi. I see now that is quite easy.
Thank you for the link pakozm as this has been confusing me.
Oh thank you very much Pakozm ! you're a life saver. The article you referred to answered precisely my question and based on that seems like the code is just fine if you pair the CE cost function with softmax activations.
Your welcome :-) I was also a bit confused about this a time ago ;-)
Thanks a lot @pakozm :)
There seems to be a bug in the the backpropagation algorithm in the NN folder. the bug is in calculating the deltas, and the derivative of the activation function in the output layer : (lines 7-12)
shouldn't the softmax be bundled with 'sigm' rather than 'linear', since the derivative of softmax is identical to 'sigm' ? (an (1 - an)) ? or am i missing something ?