Loss function
categorical_crossentropy
Because we use softmax $\to$ We make sure that, if the label is correct (ie argmax yhat coincides with argmax y) then the remaining positions are 0 (or approximately 0) $\to$ ok
Target vector
one-hot encoding
Multi-lablel classification
Activation function
Sigmoid
Sigmoid converts each score of the final node between 0 to 1 independent of what the other scores are.
Because we use sigmoid
Binary Cross-Entropy Loss is also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every vector component is not affected by other component values.
If we use categorical crossentropy, we just penalize only for case of missing labeling but not for case of residual labeling.
Target vector
Like one-hot encoding but it may have multiple one
Handling data imbalace
Upsampling
But we can not simply drop the data samples with majority labels, because these data samples could be associated with other labels as well. Dropping these samples will result in loss of other labels too.
TL;DR
Some differences between multi-class and multi-label classification
Article link
https://towardsdatascience.com/multi-label-image-classification-with-neural-network-keras-ddc1ab1afede
Key Takeaways
Multi-class classification
softmax([5, 7, 4, 6])
Loss function
categorical_crossentropy
Because we usesoftmax
$\to$ We make sure that, if the label is correct (ie argmax yhat coincides with argmax y) then the remaining positions are 0 (or approximately 0) $\to$ okTarget vector
one-hot encoding
Multi-lablel classification
sigmoid([2, -1, .15, 3]))
Loss function
binary_crossentropy
Target vector Like
one-hot encoding
but it may have multiple oneHandling data imbalace