umbertogriffo / focal-loss-keras

Binary and Categorical Focal loss implementation in Keras.
278 stars 67 forks source link

Multi-label Classification task #2

Open abdullahshafin opened 5 years ago

abdullahshafin commented 5 years ago

I just wanted to know if this can be applied to the case where there is multi-label classification problem with sigmoid output activation unit. As in, there are multiple labels that can be 1 at the same time and hence, the sum of probabilities is not necessarily equal to 1 (as is the case with softmax).

I came to your repo from this issue. Please let me know which loss function I can use in this scenario. I actually saw the code but wasn't entirely sure that the binary_focal_loss function is suitable in this problem. It looked to me as if it's only for binary classification and not for multi-label classification task.

umbertogriffo commented 5 years ago

Hi @abdullahshafin, you can't apply this to the case where there is multi-label classification problem with sigmoid output activation unit. Only support the case with softmax.

abdullahshafin commented 5 years ago

Hi @umbertogriffo

Thanks for the reply!

Do you mean using softmax for multi-label classification (like facebook paper)? It's still a bit unclear. Normally, softmax is not used for multi-label classification. Can you explain what inputs you expect for your two functions binary_focal_loss and categorical_focal_loss? Do you expect only 2 classes (binary) or does it work for more than 2 classes?

From my understanding, when talking about multiple target classes, keras uses the term binary_crossentropy (keras.losses.binary_crossentropy) for multi-label classification tasks where the output activation unit should then be sigmoid. And categorical_crossentropy (keras.losses.categorical_crossentropy) is used for multi-class classification tasks with softmax output activation unit.

Just to be sure that we are both on the same page, I will explain below what I mean with multi-label and multi-class classification terminologies.

In multi-label classification with 3 classes and 5 examples, the target vector would look like:

[0 0 1
 1 0 1
 0 0 0
 0 1 1
 0 1 0]

Target vector for multi-class classification for a similar configuration would look like:

[0 0 1
 1 0 0
 0 1 0
 0 1 0
 1 0 0]
umbertogriffo commented 5 years ago

@abdullahshafin you're absolutely right. I meant that you can apply this to the multi-class classification problem with softmax output activation unit. multi-label classification isn't supported yet.

abdullahshafin commented 5 years ago

@umbertogriffo thanks a lot for your reply and for the clarification. I tried to use your code with a few modifications for multi-label classification. After looking at the code in detail, I strongly believe it should work. However since I am not getting good results at my classification task, I cannot verify that yet. Actually, my task is already quite difficult to learn and I haven't had success learning using weighted binary CE loss either.

Once I try it on some other task and I can verify that it works/does not work for multi-label classification, I will update here.

umbertogriffo commented 5 years ago

@abdullahshafin thanks for open this issue. Would be absolutely great if you help me to adapt the code for the multi-label classification. I'll try to find a task that you can use for your experiments.

abdullahshafin commented 5 years ago

@umbertogriffo Sorry, I've been busy in verifying if my approach was indeed right or not. It seems as of now, my loss function is not correct. Once I have the correct focal loss implementation for multi-label classification, I will definitely share it. For now, I am trying to approach the problem using other methods like 1) Weighted Binary CE loss 2) Under-/over-sampling the dataset.

umbertogriffo commented 5 years ago

@abdullahshafin don't worry, let me know if I can help you somehow.

oleksandrlazariev commented 5 years ago

@abdullahshafin you could just remove K.sum from the final return statement. That should work for multi-label classification task

xingyi-li commented 5 years ago

@oleksandrlazariev Hi, I' m interested in your statement, but I don't clearly understand what you mean, would you explain your idea in detail? Thanks a lot!

bryanmooremd commented 5 years ago

@umbertogriffo My understanding is that with alpha = 1 and gamma = 0, then the focal loss should produce identical results to cross entropy. However, when I compile with loss=[categorical_focal_loss(alpha=.25, gamma=2)] vs loss = sparse_categorical_crossentropy, I get very different results. Have you directly compared the two and can you comment? I have 0/1 labels that are not one-hot-encoded.

jizhang02 commented 4 years ago

Hello, in multi-class loss function? Do we need to do one-hot encoding?

talhaanwarch commented 4 years ago

i am just checking if focal loss for multilabel classification has been implemented or not

Sandeep418 commented 4 years ago

@umbertogriffo thanks a lot for your reply and for the clarification. I tried to use your code with a few modifications for multi-label classification. After looking at the code in detail, I strongly believe it should work. However since I am not getting good results at my classification task, I cannot verify that yet. Actually, my task is already quite difficult to learn and I haven't had success learning using weighted binary CE loss either.

Once I try it on some other task and I can verify that it works/does not work for multi-label classification, I will update here.

Hi @abdullahshafin have you succeed to change to multilabel classification.

gnai commented 4 years ago

Hello, in multi-class loss function? Do we need to do one-hot encoding?

As far as I know yes, check this link, it might be useful : https://www.depends-on-the-definition.com/guide-to-multi-label-classification-with-neural-networks/

umbertogriffo commented 4 years ago

@umbertogriffo My understanding is that with alpha = 1 and gamma = 0, then the focal loss should produce identical results to cross entropy. However, when I compile with loss=[categorical_focal_loss(alpha=.25, gamma=2)] vs loss = sparse_categorical_crossentropy, I get very different results. Have you directly compared the two and can you comment? I have 0/1 labels that are not one-hot-encoded.

There was a bug that has been fixed.

thusinh1969 commented 3 years ago

Try this https://www.programmersought.com/article/60001511310/ Both binary, multi-class and multi-label. It seems to work for me.

Steve

longsc2603 commented 1 year ago

Hi, I'm sorry that I bump this old thread. But I come across this repo of yours and wonder if I can apply for my case. I'm having a BiLSTM + CRF model that has the output shape like this: (None, sequence_length, num_class). Since it is a CRF-extended model (I'm using keras-contrib for CRF layer btw), the output is one-hot-encoded. So I can not really use class_weight in model.fit, is there any ways I can use this loss for my case?