lukas / ml-class

Machine learning lessons and teaching projects designed for engineers
https://www.youtube.com/channel/UCBp3w4DCEC64FZr4k9ROxig
GNU General Public License v2.0
2.34k stars 1.17k forks source link

Loss Function: categorical_crossentropy and binary_crossentropy #34

Closed TheWindRider closed 4 years ago

TheWindRider commented 6 years ago

I was in the 05/09/2018 class before TrainAI conference, and one peer student reported better accuracy when replacing categorical_crossentropy with binary_crossentropy, and I experienced that improvement too on 2 architectures (perceptron, mlp) and possibly more.

I'd like to ask/discuss here, what the mathematics look like when apply binary cross entropy loss function to multiple-label classification? I'm speculating that the way it's enforced (although binary is supposed to work with 2 labels) happen to benefit accuracy in this problem.

Toy example and my guess: label = [0, 0, 1, 0, 0], predict = [0.1, 0.1, 0.6, 0.1, 0.1] categorical_crossentropy(label, predict) = -log(0.6) binary_crossentropy(label, predict) = -log(0.6)-4*log(0.9)

charlesfrye commented 4 years ago

Interesting observation! If you're still interested in this question, I recommend you ask it on our Slack forum for ML engineers and enthusiasts: bit.ly/slack-forum.