Closed ptoews closed 2 years ago
I've just realized that a part of this is happening intentionally as explained here: https://github.com/rasbt/mlxtend/issues/456#issuecomment-434427387 I.e. the classifiers used for feature selection and the surrounding grid search are supposed to be separate. I understand that there are good use cases for this, however since I'm at least the third person that expected the opposite I suggest to either default to sharing the classifier or at the very least document it clearly. It is very difficult to even find out what happened.
But there must still be something wrong because in above code the DebugClassifier
is never fitted with the configured hyperparameter choices. Or am I missing even more?
Also, now I'm wondering how I can couple the hyperparameters of feature selection and grid search.
Thanks for taking the time to put together this informative example and illustration of the issue. I agree with you that this is a bug or confusing behavior at the least.
It's probably due to cloning the estimator like you suggested. To get the desired behavior, I believe that setting
sfs1 = SFS(...
clone_estimator=False)
should do the trick. This should be mentioned in the documentation. Also, it would be great to have a warning message if someone is using the defaults in the grid search context -- not sure how to catch this use case scenario well though.
Thanks for your response!
I just tried out your suggestion of disabling cloning the estimator, but the scores are still the same. I had another look at the DebugClassifier
prints and here is what it says for the example above (with SFS being given cv=False
for less messages):
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Setting params: {'max_depth': 1}
max_depth after setparams: 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 10
Predicting with max_depth = 10
[CV] END ........................sfs__estimator__max_depth=1; total time= 0.0s
So the configured hyperparameter is used for fitting and predicting, but at the end there is still a classifier with the default settings used. Is it possible that it now does feature selection with the hyperparameters set by grid search but in the end it still uses default parameters? I don't understand what the purpose of the last fit/predict step is.
In comparison, here is the output with the pipeline containing only the DebugClassifier
:
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Setting params: {'max_depth': 1}
max_depth after setparams: 1
Fitting with max_depth = 1
Predicting with max_depth = 1
[CV] END ...................................clf__max_depth=1; total time= 0.0s
So the default parameters appear only when SFS is used.
Regarding the documentation I would suggest to definitely mention this in the example section, since "GridSearch" and "SequentialFeatureSelection" were the keywords that made me think this was exactly what I was looking for, but it is actually a different way of combining these methods. Maybe even make it two examples, one with a shared classifier, and one with clearly different classifiers for feature selection and grid search, so the user is aware of multiple options existing in this context.
Thanks for looking into this further!
So the configured hyperparameter is used for fitting and predicting, but at the end there is still a classifier with the default settings used. Is it possible that it now does feature selection with the hyperparameters set by grid search but in the end it still uses default parameters? I don't understand what the purpose of the last fit/predict step is.
I was confused about that at first too -- this shouldn't happen. But then looking over your code more closely, this is because of the following line:
clf = DebugClassifier(max_depth=10)
sfs1 = SFS(estimator=clf,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
cv=5)
pipe = Pipeline([('sfs', sfs1),
('clf', clf)]) ### <---------------------------
param_grid = [
{#'sfs__k_features': [1, 4],
'sfs__estimator__max_depth': [1, 5]}
]
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=False)
I think what's happening here is that it is tuning the sequential feature selector internally, and then it is somehow using the original classifier. I think that's because GridSearchCV may have made a clone of it. I am not exactly sure.
To avoid this issue, you can make it two different entities. One is used internally by the sequential feature selector, and one is used externally.
E.g.,
clf1 = DebugClassifier(max_depth=10)
clf2 = DebugClassifier(max_depth=7)
sfs1 = SequentialFeatureSelector(estimator=clf1,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
clone_estimator=False,
cv=5)
pipe = Pipeline([('sfs', sfs1),
('clf2', clf2)])
param_grid = [
{
'sfs__estimator__max_depth': [1, 5],
'clf2__max_depth': [1, 5],
}]
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=2,
#iid=True,
refit=False)
Broadly, it is best to think of the classifier in the SequentialFeatureSelector
as being decoupled from the one in the Pipeline
. Unfortunately, I don't think there is a way around that except maybe for writing a custom Pipeline
/GridSearchCV
class.
But yeah, this should probably be made more clear in the documentation.
Thanks for the tips, I believe the following solution based on your suggestions should do what I want: I encapsulated the classifier in the SFS and disabled cloning, which allowed GridSearch to configure the hyperparameters of the classifier that is also used for feature selection. But GridSearch wants to score/predict with the given estimator, which isn't supported by SFS, so I added a simple wrapper that delegates the predicting while transforming the data with the optimized feature selection.
# Same imports as above
class SFSWrapperEstimator(SFS):
def predict(self, X, *args, **kwargs):
return self.estimator.predict(self.transform(X), *args, **kwargs)
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=123)
clf = KNeighborsClassifier()
sfs1 = SFSWrapperEstimator(estimator=clf,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
cv=False,
clone_estimator=False)
param_grid = [{
'estimator__n_neighbors': [1, 5]
}]
gs = GridSearchCV(estimator=sfs1,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=True,
)
# run gridearch
gs = gs.fit(X_train, y_train)
for i in range(len(gs.cv_results_['params'])):
print(gs.cv_results_['params'][i], 'test acc.:', gs.cv_results_['mean_test_score'][i])
gs.best_estimator_.estimator.n_neighbors
Output:
{'estimator__n_neighbors': 1} test acc.: 0.37499999999999994
{'estimator__n_neighbors': 5} test acc.: 0.6583333333333334
5
I checked the output of the DebugClassifier
as well and everything looks good, but not sure if I missed something. If you think this is useful, maybe the prediction support could be added to SFS or the code could be added to the examples. But I feel that the wrapping could be done in a more elegant way.
Edit: Just realized that SFS doesn't refit the estimator after feature selection but leaves it with the last tried combination.
This is nice. Yeah, I think it should be added to the library (or the documentation at least).
Edit: Just realized that SFS doesn't refit the estimator after feature selection but leaves it with the last tried combination.
It could be a general GridSearchCV
behavior. I.e., the best estimator from GridSearchCV
behaves correctly:
Whether I run
param_grid = [{
'estimator__n_neighbors': [50, 3]
}]
gs = GridSearchCV(estimator=sfs1,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=True,
)
# run gridearch
gs = gs.fit(X_train, y_train)
for i in range(len(gs.cv_results_['params'])):
print(gs.cv_results_['params'][i], 'test acc.:', gs.cv_results_['mean_test_score'][i])
gs.best_estimator_.estimator.n_neighbors
or run the following (order of params flipped)
param_grid = [{
'estimator__n_neighbors': [3, 50]
}]
gs = GridSearchCV(estimator=sfs1,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=True,
)
# run gridearch
gs = gs.fit(X_train, y_train)
for i in range(len(gs.cv_results_['params'])):
print(gs.cv_results_['params'][i], 'test acc.:', gs.cv_results_['mean_test_score'][i])
gs.best_estimator_.estimator.n_neighbors
it both results in the same best estimator params.
When I then check
sfs1.estimator.get_params()['n_neighbors']
it prints 5 (the default value passed in before grid search, that's probably GridSearchCV
is working with a clone)
Interesting. But that doesn't seem to have been the issue here, because I accessed the (cloned) optimized SFS via gs.best_estimator_
. I still think that SFS doesn't necessarily have a reference to the best estimator it fitted, because I noticed that after GridSearch fitted SFS it couldn't evaluate its score, since the last iteration was fitted with 3 features but the evaluation called SFS with just 1 feature as a result of the optimization. Looking at the code of SFS.fit()
I couldn't see any updates to the estimator
attribute after optimization either.
So I updated my solution by also overriding fit()
and simply fitting the internal estimator with the optimized feature selection afterwards, like this:
class SFSWrapperEstimator(SFS):
def fit(self, X, y, **kwargs):
super().fit(X, y, **kwargs)
self.estimator.fit(self.transform(X), y, **kwargs)
def predict(self, X, *args, **kwargs):
return self.estimator.predict(self.transform(X), *args, **kwargs)
It's quite ugly and inefficient, a better solution would be to store all the fitted estimators of SFS or at least their parameters to be able to provide the best estimator afterwards in a dedicated attribute and without additional fitting. However, so far this seems to work, and I will continue to check if everything is correct now.
Yesterday when you first mentioned that I checked the SFS code as well and couldn't find anything about the refitting after optimization either. It is a bit hard to find out what's going on exactly because there are so many layers (SFS, pipeline, GridSearchCV). The proposed workaround with the SFSWrapperEstimator
seems like a good one for making sure it works as intended. We can add that to the documentation for now until we may find a better solution.
Sounds good. Apart from that, using the initial approach still seems to cause a bug (cited below), right?
But there must still be something wrong because in above code the DebugClassifier is never fitted with the configured hyperparameter choices.
It doesn't affect my suggested solution but might still be an issue for others.
I just had some time to revisit this. I think the issue(s) can currently be avoided by not using refit=True
and using
pipe.set_params(**gs.best_params_).fit(X_train, y_train)
instead. I am updated the docs via #875 accordingly. Also, at this point it may make sense not to change the SFS behavior but to focus more on the replacement via #834
Describe the bug
When using sklearn's
GridSearchCV
withSequentialFeatureSelector
, the configured hyperparameter values are not properly propagated to the actual classifier that is used for fitting and predicting. I put together a MWE below that is based on example 8 in the docs, the only major change is the custom classifier.In the output listed in the docs you can see that the score doesn't change with the
k
parameter of the KNN, which is very strange. While searching for similar issues I found that this has already been mentioned in multiple other issues, e.g. #456 and #511. Below you can see the unexpected behavior in the suggested approach.Steps/Code to Reproduce
Expected Results
Actual Results
As you can see, the value
1
for the hyperparametermax_depth
is correctly configured for some classifier, however while fitting and predicting it appears that a different classifier is used, where the default value ofmax_depth=10
is still set.Versions