[FEATURE] Allow Pipelines in GASearchCV, vs. Estimators Only

windowshopr commented 2 years ago

Would be cool to see GASearchCV allow SKLearn Pipeline objects into the mix as well. For example:

pipeline = Pipeline([('scaler', MinMaxScaler()),
                     ('estimator', ADABoostClassifer())])

evolved_estimator = GASearchCV(estimator=pipeline ,
                                           scoring='balanced_accuracy',
                                           cv=TimeSeriesSplit(n_splits=3),
                                           population_size=30,
                                           generations=30,
                                           tournament_size=3,
                                           elitism=True,
                                           crossover_probability=0.8,
                                           mutation_probability=0.1,
                                           param_grid=curr_params,
                                           criteria='max',
                                           algorithm='eaMuPlusLambda',
                                           n_jobs=-1),
                                           verbose=True,
                                           keep_top_k=1)

I envision in the code somewhere, there could be a check, something like:

if isinstance(self.estimator, sklearn.pipeline.Pipeline():
    self.estimator = self.estimator['estimator']

That way, it could parse the base estimator from the pipeline and the rest of the code could work, but it's nice to know that feature scaling is working properly when using cross validation. Proper scaling would be to scale the training folds first, then transform the testing fold, which all happens within the CV itself. So using a pipeline with SKLearn would allow this?

rodrigo-arenas commented 2 years ago

Hi @windowshopr , this is already possible in sklearn-genetic-opt, the transformations are applied in each of the folds, you can see an example here on how to use pipelines with this package.

windowshopr commented 2 years ago

Thanks Rodrigo!

rodrigo-arenas / Sklearn-genetic-opt

[FEATURE] Allow Pipelines in GASearchCV, vs. Estimators Only #78