Cross entropy learning - Githubissues

joaopauloschuler / neural-api

CAI NEURAL API - Pascal based deep learning neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.

GNU Lesser General Public License v2.1

379 stars 199 forks source link

Cross entropy learning #135

Open mikerabat opened 7 months ago

mikerabat commented 7 months ago

I hope I'm not too annoying but you guys are the experts in that area so I hope I can discuss another neat feature with you...

While browsing through the "Neural Networks for Pattern Recognition" from C. M. Bishop I recognized that there are more than the standard learning error propagation method with mean squared error but rather there is one called Cross Entropy loss function... There are a few sources that claim that this error/loss function would indeed allow faster learning progress....

What do you think? Would that be a viable feature for the library?

joaopauloschuler commented 7 months ago

The forward pass of categorical cross-entropy is implemented via TNNetSoftMax. There is an example at:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr

Regarding the backpropagation, I have changed my mind about the best approach to it a number of times. In some APIs, the derivative on the last softmax layer is just not calculated and the errors are passed assuming derivative 1. You can get this behaviour via the parameter {SkipBackpropDerivative=}1. In most of cases, we get faster convergence with {SkipBackpropDerivative=}1.

You can use cross-entropy right now with TNNetSoftMax as per example.

joaopauloschuler commented 7 months ago

@mikerabat , you are certainly not annoying. Glad to help.

mikerabat commented 6 months ago

Thank you for the clarification. I guess that my misunderstanding was that I thought the cross-entropy type of error propagation would also be applied in the inner layers... I actually always struggled to wrap my head around that...

mikerabat commented 6 months ago

Is there actually a way to implement some kind of weighting in the softmax handling/backpropagation -> The reason is that I deal with a dataset that is heaviyl skewed towards one class (aka in terms of 100:1 that is a reasonable setup for ECG classification...). So there is a heavy bias towards that one class. One way I was dealing with that problem was to reduce the number of elements in the first class to have at best a reation of 5:1 ... other approaches could be to have a weighted loss function or weighting in the last softmax layer to emphasize the error in the classification step right?