Closed RNarayan73 closed 1 year ago
Hi, @RNarayan73 I understand what you're trying to accomplish, there are a few things to notice:
from sklearn.datasets import load_iris
from lightgbm import LGBMClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn_genetic import GAFeatureSelectionCV
from sklearn.model_selection import GridSearchCV
iris = load_iris()
feature_selection = GAFeatureSelectionCV(LGBMClassifier(),
generations=5, population_size=5,
n_jobs=-1,
)
grid_search = GridSearchCV(SGDClassifier(),
param_grid={'alpha': [10e-04, 10e-03, 10e-02, 10e-01, 10e+00]},
)
ga_search_pipe = Pipeline([("dim", feature_selection), ("clf", grid_search)])
ga_search_pipe.fit(iris.data, iris.target)
Having said that, I'll investigate what is causing the error, it seems at a first sight that it has to be with the way the set_params works to clone the underlying estimator
I hope this helps
Hello @rodrigo-arenas
Thank you for your reply and for investigating the issue further.
In general, I'd not suggest mixing feature selection and hyperparameter tuning in the same iteration, this not only creates a large model (a whole feature selection algorithm per each hyperparameter candidate) but also has some other consequences on the optimization
With regard to your comment below, there are different approaches. Yes, it is a more challenging problem over a wider search space, but given the stochastic nature of ML, doing it together in fact improves the robustness of the model. Furthermore, having them together in the pipeline is the only way to also tune hyperparameters for the Feature Selection step too. There is a good amount of literature supporting this approach and I share some links below which advocate this approach. https://machinelearningmastery.com/machine-learning-modeling-pipelines/
I hope you will be able to fix it soon.
Regards
Narayan
@RNarayan73 this has been solved in PR #128, you can clone the repo to test it out
System information Windows 10 Sklearn-genetic-opt version: 0.10.1 Describe the bug When import module from sklearn_genetic import GAFeatureSelectionCV, ExponentialAdapter it throws up an error: ImportError: cannot import name '_estimator_has' from 'sklearn.feature_selection._from_model' (F:\anaconda\lib\site-packages\sklearn\feature_selection_from_model.py)
Hi @ananzibian the error you showing is not related to this bug
But I think what is happening is that you might have an old version of scikit-learn which doesn't have the _estimator_has
function
Please install a more recent version, for example
pip install scikit-learn==1.2.1
@RNarayan73 this has been fixed and released in version 0.10.1
@rodrigo-arenas, thank you for the fix.
System information OS Platform and Distribution: Windows 11 Home Sklearn-genetic-opt version: 0.10.0 deap version: 1.3.3 Scikit-learn version: 1.2.1 Python version: 3.10.1
Describe the bug When including GAFeatureSelectionCV as a transformer within a pipeline to carry out feature selection and then running GridSearchCV or GASearchCV on the pipeline to optimise hyperparameters, it throws up an error:
To Reproduce Steps to reproduce the behavior:
Expected behavior The pipeline should be fitted without any errors.
Additional context This situation arises when trying to wrap a whole pipeline with a hyperparameter tuning class such as GridSearchCV or GASearchCV. The purpose of including the pipeline within *SearchCV is to optimise hyperparameters of additional transform steps before the 'dim' step along with the hyperparameters of the classifier, although such steps are not shown above for brevity.