scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
60.24k stars 25.43k forks source link

Multioutput-multiclass estimators have broken score method. #9414

Open amueller opened 7 years ago

amueller commented 7 years ago

It looks to me like the decision trees use accuracy_score for their score but accuracy_score doesn't document that it's supporting multiclass-multioutput which the trees do.

~~I guess the y_true == y_pred works in this case, but it should be documented. There's no list of scores supporting multiclass-multioutput in the docs, and that should be fixed, too.~~

Update: Calling score in this case errors :-/

amueller commented 7 years ago

It looks like there is no test for multiclass-multioutput in the metrics/tests/test_common.py, and it looks like accuracy is not tested with multiclass-multioutput at all. I'm not a big fan of the multiclass-multioutput functionality but if we support it we should at least document and test it (or deprecate it, so we don't have another type of label to support)

amueller commented 7 years ago

[my preferred solution to this would be to deprecate the multioutput multiclass behavior but I'm not sure people like that]

amueller commented 7 years ago

Oh and this page even says

Warning At present, no metric in sklearn.metrics supports the multioutput-multiclass classification task.
thismlguy commented 7 years ago

The score function for a decision tree classifier which supports multi-class multi-output cases doesn't work. although the code might be right, but score will call "_check_targets" first which flags an error "ValueError: multiclass-multioutput is not supported"

thismlguy commented 7 years ago

The warning is right. I guess the API allows you to make predictions using supported classifiers but user will have to define their own functions to find accuracy or other measures.

amueller commented 7 years ago

Wow, that's pretty bad. We allow to train but the score method is broken? That seems like one more reason to kick this out.

thismlguy commented 7 years ago

yea looks like it.. but it would have been merged at some point with some reasoning.. also, some people might be using it already.. so not sure.

UT1 commented 6 years ago

I tried to use multi output classifier and pushed the Decision Tree Classifier as an estimator ,then in order to know the score I used the Accuracy Score from sklearn.metrics it did not show any error ,rather it gave me a value (0.81 to be exact).I wanted to know is it giving me false scores ?? and also how do I find the scores manually as you discussed above.

SundarRengarajan commented 5 years ago

I tried to use multi output classifier and pushed the Decision Tree Classifier as an estimator ,then in order to know the score I used the Accuracy Score from sklearn.metrics it did not show any error ,rather it gave me a value (0.81 to be exact).I wanted to know is it giving me false scores ?? and also how do I find the scores manually as you discussed above.

@UT1 Surprised that you did not get error for accuracy_score. Just wanted to check that yours is not a case where your original targets are categorical and you have encoded them using OHE and hence your revised targets are actually multilabel (but not multiclass). Thanks.

RabeyaMuna commented 3 years ago

I tried to use multi output classifier and pushed the Decision Tree Classifier as an estimator ,then in order to know the score I used the Accuracy Score from sklearn.metrics it did not show any error ,rather it gave me a value (0.81 to be exact).I wanted to know is it giving me false scores ?? and also how do I find the scores manually as you discussed above.

how did you do that??