rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.82k stars 853 forks source link

ValueError: multiclass format is not supported #1038

Closed arilwan closed 1 year ago

arilwan commented 1 year ago

I know this is sklearn related, but maybe important to report here.

Working on a multiclass (5 class) problem, and a random forest estimator.

from sklearn.ensemble import RandomForestClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS 

# initialise model
model = RandomForestClassifier(n_jobs=-1, verbose=0)

# initialise SFS object
sffs = SFS(model, k_features = "best",
           forward = True, floating = True, n_jobs=-1,
           verbose = 2, scoring= "roc_auc", cv=5 )

sffs.fit(X, y)

Error:

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
packages/sklearn/metrics/_scorer.py", line 106, in __call__
    score = scorer._score(cached_call, estimator, *args, **kwargs)
  File "~/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 352, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Package info:

>>> import sklearn, mlxtend

>>> print(sklearn.__version__)
1.0.2
>>> print(mlxtend.__version__)
0.22.0
rasbt commented 1 year ago

Thanks for the note. It would be nice to modify the implementation some time to support this. I just checked and GridSearchCV in sklearn supports it via gs = GridSearchCV(estimator=clf, scoring="roc_auc", ...) too.

In the meantime, I think you can use

from sklearn.metrics import roc_auc_score
sffs = SFS(..., scoring=roc_auc_score, ...)
arilwan commented 1 year ago

Not a valid scoring parameter:

File "~/venv/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 432, in get_scorer
    raise ValueError(
ValueError: 'roc_auc_score' is not a valid scoring value. Use sklearn.metrics.get_scorer_names() to get valid options.
>>> 
arilwan commented 1 year ago

Quick fix.

Using scoring= 'roc_auc_ovr_weighted' works, according to the table.

rasbt commented 1 year ago

Thanks! This should perhaps be added to the mlxtend docs in the meantime before we find more time to make "roc_auc" work out of the box.

I wonder what sklearn uses when gs = GridSearchCV(..., scoring="roc_auc", ...) is called. It probably defaults to either _ovo or _ovr