Multilabel Classification Evaluation

angrymeir commented 4 years ago

Hey @sergioburdisso,

Thank you for this awesome project! Currently the evaluation class only supports single label classification, even though SS3 inherently supports multilabel classification. These are the steps (I see) needed to support multilabel classification evaluation:

Take the output of classify_multilabel
Convert result to binarized vector (same length as confidence vector)
Implement multilabel classification metrics usage (e.g. Hamming Loss)
Adopt Gridsearch

angrymeir commented 4 years ago

Edit: Since the multilabel stratified k-fold cross validation is not implemented in sklearn this repository might help for the implementation of multilabel gridsearch.

sergioburdisso commented 4 years ago

Thank you, @angrymeir! You're helping to make this humble project better!

That is totally right, the current implementation of the evaluation class does not provide support for multilabel classification.

What do you think of adding an extra argument to the classify_multilabel called, for instance, indicator_function_output, which could be either True or False. This argument could be used to enable the output to be a binarized vector having the value 1 for all $c_i$ such that $doc \in c_i$ (according to the trained model), and 0 otherwise. Do you think the name indicator_function_output is OK?

I'm currently working on the train method, which now should make the training procedure much easier and clearer, allowing the y_train list to be composed of lists of labels (not single labels). One interesting thing I realized is that some datasets will provide no labels at all for some documents (e.g. this one), thus, the empty list [] is a valid "label". Internally, I create a special "other" category as a workaround. Good thing is that now the train/fit will be much flexible.

Thanks for suggesting that repository implementing multilabel stratified k-fold cross-validation! it seems quite straightforward to use.

BTW, taking into account your great ideas, suggestions, and feedback, do you mind being added to the README file as a contributor?

sergioburdisso commented 4 years ago

BTW just in case that you're wondering regarding being added as a contributor, PySS3 follows the all-contributors specification, "Recognize all contributors, not just the ones who push code" :sunglasses:

Now that I'm done with the other Issue, I'll continue with this one :alien: :coffee:

angrymeir commented 4 years ago

Sounds like a plan! I'll also further read into stratification.

I would be honored to be listed as a contributor! However, the ideas are not only from me, but also my colleague @vaiyani!

sergioburdisso commented 4 years ago

@all-contributors could you add @Vaiyani and @angrymeir as contributors for ideas, suggestions, and feedback?

allcontributors[bot] commented 4 years ago

@sergioburdisso

I've put up a pull request to add @angrymeir and @Vaiyani! :tada:

sergioburdisso commented 4 years ago

@angrymeir and @Vaiyani, both were added to the readme file! :sunglasses: Thanks, guys. I've also added you as contributions not only for ideas but also for data (since probably I'll be using your SemEval 2016 Task 5 dataset for the tutorials and live demo, as suggested in Issue #6).

Vaiyani commented 4 years ago

@sergioburdisso Thanks for this great project as well :)

angrymeir commented 4 years ago

Hey @sergioburdisso , I just tried out clf.fit() and Evaluation.test() on our multilabel dataset and it works like a charm! Wuhu 🥳 Thank you for implementing this!

Regarding the Grid Search should I create a separate Issue for that?

sergioburdisso commented 4 years ago

@angrymeir Cool!!! I've just finished with the kfold_cross_validation, now I'll start with the grid_search, it shouldn't be too difficult since it mostly calls test and kfold_cross_validation. I've been doing the changes in such a way to make things easier for me, not only for grid_search but also for the interactive evaluation 3D Plot (Evaluation.plot()) which now shouldn't take me too much time to adapt it to support multilabel classification.

sergioburdisso commented 4 years ago

@angrymeir @Vaiyani Guys! I've finally finished adding full multi-label classification support to the Evaluation class! Yay!!! :partying_face::partying_face::partying_face:

Thanks, guys, for creating this issue :) these changes were necessary. Issue #9 is also part of this overall process of adding full multi-label classification support to PySS3 so, as soon as I finish with the other two issues, I'll finally release the new version (0.6.0). Do you think guys that we should also add a new tutorial showing the new features? do you think your dataset is gonna be well suited for that or should I use a simpler one? sort of more like a "proof-of-concept" dataset... what do you think?

Vaiyani commented 4 years ago

@sergioburdisso thankyou for the quick and effective response from your side on this issue.

I believe tutorial would be a good idea for the new people as well because tutorials are the first point of learning (from my experience). Would be really helpful.

As for the data, not quite sure. Our dataset (Sem eval) is also well suited for this but at the end whichever delivers the message clearly should be the aim.

angrymeir commented 4 years ago

I guess a tutorial highlighting the differences would be great! However we can't use SemEval for that, since we're not allowed to redistribute it publicly.. I think the Toxic Comment Dataset should also be well suited for that :)

In case you need help with the notebook/don't have time to implement it, let me know and I'll create one!

sergioburdisso commented 4 years ago

Guys! I've just finished implementing the multi-label support for the Live Test tool (issue #9).

Now, in the left panel, test documents are shown with a % corresponding to the label-based accuracy (aka hamming score). Besides, when a document is selected, the true labels are shown along with the predicted labels, misclassified labels are shown in red, as "drama" below:

I'm about to release the new version soon, I'm just performing the final checks. Regarding the dataset for the tutorial, I've finally decided to use a subset of the CMU Movie Summary Corpus with only 10 categories (and 32985 documents/plot summaries). I've already uploaded the zipped dataset to the repo (5f5c055), it uses the same format as you suggested in Issue #6 (one file for (semicolon separated) labels, another for docs), so I'll probably start working on (a very basic version of) the tutorial soon :blush:

sergioburdisso / pyss3

Multilabel Classification Evaluation #5