There is a question for me while reading the eesen code.
Location: line 66 to line 75 at file net/ctc_loss.cc
When back-propagate the errors through the softmax layer, as I can get from the code,
the formula is ctc_error * yk - Row_mul(yk * ColSum(ctc_error * yk) ).
But the formula of softmax-derivation is yk * (1 - yk).
So as I can get, the difference of using softmax-derivation formular is the ColSum and RowMul.
And why? Is there something I missed?
Hi ,
There is a question for me while reading the eesen code. Location: line 66 to line 75 at file net/ctc_loss.cc When back-propagate the errors through the softmax layer, as I can get from the code, the formula is ctc_error * yk - Row_mul(yk * ColSum(ctc_error * yk) ). But the formula of softmax-derivation is yk * (1 - yk). So as I can get, the difference of using softmax-derivation formular is the ColSum and RowMul. And why? Is there something I missed?
Looking forward for reply!