Open mikerabat opened 7 months ago
The forward pass of categorical cross-entropy is implemented via TNNetSoftMax
. There is an example at:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr
Regarding the backpropagation, I have changed my mind about the best approach to it a number of times. In some APIs, the derivative on the last softmax layer is just not calculated and the errors are passed assuming derivative 1. You can get this behaviour via the parameter {SkipBackpropDerivative=}1
. In most of cases, we get faster convergence with {SkipBackpropDerivative=}1
.
You can use cross-entropy right now with TNNetSoftMax
as per example.
@mikerabat , you are certainly not annoying. Glad to help.
Thank you for the clarification. I guess that my misunderstanding was that I thought the cross-entropy type of error propagation would also be applied in the inner layers... I actually always struggled to wrap my head around that...
Is there actually a way to implement some kind of weighting in the softmax handling/backpropagation -> The reason is that I deal with a dataset that is heaviyl skewed towards one class (aka in terms of 100:1 that is a reasonable setup for ECG classification...). So there is a heavy bias towards that one class. One way I was dealing with that problem was to reduce the number of elements in the first class to have at best a reation of 5:1 ... other approaches could be to have a weighted loss function or weighting in the last softmax layer to emphasize the error in the classification step right?
I hope I'm not too annoying but you guys are the experts in that area so I hope I can discuss another neat feature with you...
While browsing through the "Neural Networks for Pattern Recognition" from C. M. Bishop I recognized that there are more than the standard learning error propagation method with mean squared error but rather there is one called Cross Entropy loss function... There are a few sources that claim that this error/loss function would indeed allow faster learning progress....
What do you think? Would that be a viable feature for the library?