mp2893 / doctorai

Repository for Doctor AI project
BSD 3-Clause "New" or "Revised" License
272 stars 111 forks source link

Softmax for multi-label classification ? #10

Open aparnapai7 opened 4 years ago

aparnapai7 commented 4 years ago

Hi, since the RNN is used to perform a multi label classification task, shouldn't Sigmoid be used instead of Softmax layer at the end to calculate the probabilities? Couldn't Softmax be the perfect for a Multi-class classification problem? Please let me know your thoughts!

mp2893 commented 4 years ago

As you said, diagnosis prediction is inherently a multi-label multi-class classification. So, like you suggested, it seems natural to train a bunch of sigmoid classifiers instead of a single softmax classifier. But in practice, I found that a single softmax works better than a bunch of sigmoid classifiers. I think this is because the convex nature of softmax (all classes' probability must sum to one) works as a regularizer. When you use a bunch of sigmoid classifiers, there is nothing stopping the model to predict all disease to have 1.0 probability. It would be very rare, but it is certainly possible. But this would never happen with the softmax. I think that is why the softmax works better. I think it is also worthwhile to try a ranking loss instead of log-likelihood loss (http://cs-people.bu.edu/hekun/papers/CVPR2019FastAP.pdf) If the true diagnosis codes are A, B, and C out of 1000 possible codes, then you want the A, B, and C to be ranked higher than the remaining 9997 codes. If you happen to try this, please let me know if this works better than softmax 🙂

annamarianau commented 3 years ago

Hi, I was going to ask the same question so decided to just respond here with a follow-up question. How are you doing the classification? Are you using some threshold or for example, if the target has three labels (codes) then the input is classified as the three labels with the highest probability? Please let me know your thoughts and thank you!

mp2893 commented 3 years ago

In this code, there is no specific threshold for deciding the final set of output codes. Using the softmax probabiliies, you can choose the top-N codes with the highest probabilities.

annamarianau commented 3 years ago

In this code, there is no specific threshold for deciding the final set of output codes. Using the softmax probabiliies, you can choose the top-N codes with the highest probabilities.

Thank you for the clarification!