Open kemaldahha opened 1 year ago
Thanks for the note! I can confirm, having this issue in sklearn 1.3.0 as well (but not in 1.2.2). I just submitted a PR via #1060 to fix that
I came across this lecture by @rasbt. Based on his explanation StackingClassifier was included in sklearn. I adjusted the code to use the sklearn version of StackingClassifier:
from sklearn import datasets
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
# from mlxtend.classifier import StackingClassifier
import numpy as np
import warnings
warnings.simplefilter('ignore')
clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
estimators = [("clf1", clf1),
("clf2", clf2),
("clf3", clf3)]
lr = LogisticRegression()
sclf = StackingClassifier(estimators=estimators,
final_estimator=lr)
print('3-fold cross validation:\n')
for clf, label in zip([clf1, clf2, clf3, sclf],
['KNN',
'Random Forest',
'Naive Bayes',
'StackingClassifier']):
scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring="accuracy")
print("Accuracy: %0.2f (+/- %0.2f) [%s]"
% (scores.mean(), scores.std(), label))
Now I do get an output more in line with what I expect, though not exactly same as in the mlxtend StackingClassifier documentation (Example 1):
3-fold cross validation:
Accuracy: 0.91 (+/- 0.01) [KNN]
Accuracy: 0.95 (+/- 0.01) [Random Forest]
Accuracy: 0.91 (+/- 0.02) [Naive Bayes]
Accuracy: 0.93 (+/- 0.02) [StackingClassifier]
Perhaps sklearn's StackingClassifier implementation is different from mlxtend's.
I am wondering whether we should still use mlxtend's StackingClassifier or whether it is deprecated and we should use sklearn's implementation instead?
Thanks for the note! I can confirm, having this issue in sklearn 1.3.0 as well (but not in 1.2.2). I just submitted a PR via #1060 to fix that
Thanks for the reply. I posted my second comment before I read your reply, apologies.
Hi, I try to run the code below (Example 1 from the StackingClassifier documentation):
I get the following output:
The expected output is that the score for StackingClassifier should be a number like:
When I print the warning by commenting out
warnings.simplefilter('ignore')
, I get the output below (I truncated it, as the warning is repeated several times):The problem seems to be related to the
scoring
argument inscores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy')
. If I remove that argument, then the default scoring is used (accuracy, I think), and then I get the expected output which is the same as in the example in the documentation:However I would like to be able to use other scoring metrics as well (e.g.
roc_auc
), but then I have to provide the argument explicitly and I get the nan score again for StackingClassifier.I already checked issues #423 and #426, which mention a similar warning/error (
AttributeError: 'StackingClassifier' object has no attribute 'classes_'
), but I couldn't figure it out based on those issues.I am using: