Which loss function works in multi-label classification task?

buaasky commented 6 years ago

I need to train a multi-label classifier for text topic classification task. Having searched around the internet, I follow the suggestion to use sigmoid + binary_crossentropy. But I can't get good results (i.e. subset accuracy) on the validation set although the loss is very small. After reading the source codes in Keras, I find out that the binary_crossentropy loss is implemented like this,

def binary_crossentropy(y_true, y_pred): 
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

My doubt is whether it makes sense to use the average in the case of multi-label classification task. Suppose that the dimension of label set is 30, and each training sample has only two or three of the labels. Since most of the labels are zeros in the most of the samples, I guess this loss will encourage the classifier to predict a tiny probability in each output dimension.

Following the idea here, https://github.com/keras-team/keras/issues/2826, I also give a try to categorial_crossentropy but still have no such luck.

Any tips on choosing the loss function for multi-label classification task is beyond welcome. Thanks in advance.

ismaeIfm commented 6 years ago

The standard way to train a multilabel classifier is with sigmoid + binary_crossentropy, but you can train a multilabel classifier with tanh + hinge, just the targets should be in {-1,1}. I don't think your issue has to do with the loss and the output activation, I think is more related with the complexity of your model. Also I'm curious: how you're evaluating your model?

buaasky commented 6 years ago

@ismaeIfm Thanks for you answering. Maybe I did not make my question exactly. The model I use is a BLSTM model with attention mechanism. I used it for my text topic multi-class classification task with categorical_crossentropy and it proves to be good. So, when I encounter text topic multi-label classification task, I just switched from softmax+ctg_ent to sigmoid+binary_ent. But the results are not that good. So I am wondering if there is something wrong with my loss function. The way I use to evaluating my model is subset accuracy, which is the same as accuracy in multi-class problem. We regard it to be right only when the output is the same as true label.

ismaeIfm commented 6 years ago

As far as I understand subset accuracy needs the explicit classes {0, 1}, but your model outputs probabilities, how did you choose the threshold to binarize the labels? Have you tried using LRAP to evaluate your model?

daniel410 commented 6 years ago

For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1). Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1). In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well. You can check this paper https://arxiv.org/abs/1708.02002.

buaasky commented 6 years ago

@ismaeIfm I’ve chosen 0.5 as the threshold to get binary outputs because each output represents the probability of the corresponding label. I will try LRAP to evaluate my model to see if how the model works. Thanks a lot.

buaasky commented 6 years ago

@daniel410 Thanks for you answering. It helps me a lot , and I will try the methods that you provided. I think it will help to use the focal loss because the focal loss can alleviate the issue of imbalanced labels.

BovineEnthusiast commented 5 years ago

I found an implementation of multi-label focal loss here:

https://github.com/Umi-you/FocalLoss

EDIT: Seems like his implementation doesn't work.

dberma15 commented 5 years ago

The multi-label focal loss equation doesn't seem to work.

Abhijit-2592 commented 5 years ago

@dberma15 focal loss doesn't work as in, it doesn't converge or implementation error? I feel it is the latter. Because, of 2 major issues. It shouldn't use numpy and implementation of cross entropy loss is flawed

Abhijit-2592 commented 5 years ago

import tensorflow as tf

K = tf.keras.backend

class FocalLoss(object):
    def __init__(self, gamma=2, alpha=0.25):
        self._gamma = gamma
        self._alpha = alpha

    def compute_loss(self, y_true, y_pred):
        cross_entropy_loss = K.binary_crossentropy(y_true, y_pred, from_logits=False)
        p_t = ((y_true * y_pred) +
               ((1 - y_true) * (1 - y_pred)))
        modulating_factor = 1.0
        if self._gamma:
            modulating_factor = tf.pow(1.0 - p_t, self._gamma)
        alpha_weight_factor = 1.0
        if self._alpha is not None:
            alpha_weight_factor = (y_true * self._alpha +
                                   (1 - y_true) * (1 - self._alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    cross_entropy_loss)
        return K.mean(focal_cross_entropy_loss, axis=-1)

@MrSnappingTurtle and @dberma15 My implementation of focal loss for Keras

randomwalker42 commented 5 years ago

For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1). Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1). In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well. You can check this paper https://arxiv.org/abs/1708.02002.

@daniel410 Hi, would you mind sharing how you implement your focal loss for the multi-label task, if it's not too much trouble?

talhaanwarch commented 4 years ago

    def compute_loss(self, y_true, y_pred):
        cross_entropy_loss = K.binary_crossentropy(y_true, y_pred, from_logits=False)
        p_t = ((y_true * y_pred) +
               ((1 - y_true) * (1 - y_pred)))
        modulating_factor = 1.0
        if self._gamma:
            modulating_factor = tf.pow(1.0 - p_t, self._gamma)
        alpha_weight_factor = 1.0
        if self._alpha is not None:
            alpha_weight_factor = (y_true * self._alpha +
                                   (1 - y_true) * (1 - self._alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    cross_entropy_loss)
        return K.mean(focal_cross_entropy_loss, axis=-1)

@Abhijit-2592 is it for multilabel classification?

it give me error AttributeError: 'FocalLoss' object has no attribute 'get_shape' i used it as follow model.compile('adam',loss=FocalLoss,metrics=['accuracy'])

sushanttripathy commented 4 years ago

You can try my implementation and let me know if it works. https://github.com/sushanttripathy/Keras_loss_functions/blob/master/focal_loss.py

Vishnux0pa commented 4 years ago

@sushanttripathy: I tried your code and it works but the output is a tensor focal_loss_tensor is of 2d array. Should I take a mean to arrive at the final loss?

sushanttripathy commented 4 years ago

@Vishnux0pa I am not sure if the auto-differentiation requires me to provide the loss per sample (instead of per batch). I looked at categorical_crossentropy, and it seemed like that's what it was doing.

I did not get convergence with the earlier version of the loss (the one that yielded a scalar). It does converge with this one though.

keras-team / keras

Which loss function works in multi-label classification task? #10371