rodrigo-arenas / Sklearn-genetic-opt

ML hyperparameters tuning and features selection, using evolutionary algorithms.
https://sklearn-genetic-opt.readthedocs.io
MIT License
307 stars 76 forks source link

AttributeError: FitnessMax when running GASearchCV on a pipeline containing GAFeatureSelectionCV #131

Closed RNarayan73 closed 1 year ago

RNarayan73 commented 1 year ago

System information OS Platform and Distribution: Windows 11 Home Sklearn-genetic-opt version: 0.10.1 deap version: 1.3.3 Scikit-learn version: 1.2.1 Python version: 3.10.10

Describe the bug When including GAFeatureSelectionCV as a transformer within a pipeline to carry out feature selection and then running GASearchCV on the pipeline to optimise hyperparameters, it initially throws up this warning message:

C:\Users\naray\Miniconda3\envs\skl_310\lib\site-packages\deap\creator.py:138: RuntimeWarning: A class named 'FitnessMax' has already been created and it will be overwritten. Consider deleting previous creation of that class or rename it. C:\Users\naray\Miniconda3\envs\skl_310\lib\site-packages\deap\creator.py:138: RuntimeWarning: A class named 'Individual' has already been created and it will be overwritten. Consider deleting previous creation of that class or rename it.

It seems to then run through various generations successfully, with logs like this printed out (truncated for brevity):

gen nevals fitness fitness_std fitness_max fitness_min 0 5 0.939988 0.012589 0.959893 0.920083
1 10 0.94795 0.00975139 0.959893 0.939988
2 10 0.951872 0.0160428 0.959893 0.919786
3 10 0.963815 0.00480292 0.969697 0.959893
4 10 0.961854 0.0114332 0.969697 0.940285
5 10 0.959774 0.0154184 0.969697 0.929887
gen nevals fitness fitness_std fitness_max fitness_min 0 5 0.908794 0.0116677 0.930778 0.900772
1 10 0.918835 0.00975139 0.930778 0.910873
2 10 0.920737 0.0177374 0.950386 0.900772
3 10 0.916756 0.0172643 0.950386 0.900772
4 10 0.950327 0.0108482 0.96019 0.930481
5 10 0.92038 0.0748142 0.96019 0.770945
..... ..... .....

before finally throwing up the following error:


AttributeError Traceback (most recent call last) Cell In[101], line 8 2 from sklearn_genetic import GASearchCV 4 ga_search_pipe = GASearchCV(test_pipe, generations=5, population_size=5, 5 param_grid={'clf__alpha': skg.space.Continuous(10e-2, 10e0, 'log-uniform')}, 6 ) ----> 8 ga_search_pipe.fit(iris.data, iris.target) 10 grid_search_pipe.predict(iris.data)

File ~\Miniconda3\envs\skl_310\lib\site-packages\sklearn_genetic\genetic_search.py:543, in GASearchCV.fit(self, X, y, callbacks) 536 self._hof.keys.insert(0, self.bestscore) 538 self.hof = { 539 k: {key: self._hof[k][n] for n, key in enumerate(self.space.parameters)} 540 for k in range(len(self._hof)) 541 } --> 543 del self.creator.FitnessMax 544 del self.creator.Individual 546 return self

AttributeError: FitnessMax

To Reproduce Steps to reproduce the behavior:

from sklearn.datasets import load_iris

from lightgbm import LGBMClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from sklearn_genetic import GAFeatureSelectionCV
import sklearn_genetic as skg
from sklearn_genetic import GASearchCV

iris = load_iris()

test_pipe = Pipeline([
                    # 1 Feature Selection using GAFeatureSelectionCV
                      ('dim', GAFeatureSelectionCV(LGBMClassifier(), 
                                                   generations=5, population_size=5, 
                                                   n_jobs=-1, 
                                                  )
                      ), 

                      ('clf', SGDClassifier())
                     ]
                    )

ga_search_pipe = GASearchCV(test_pipe, generations=5, population_size=5, 
                            param_grid={'clf__alpha': skg.space.Continuous(10e-2, 10e0, 'log-uniform')},
                           )

ga_search_pipe.fit(iris.data, iris.target)

Expected behavior The pipeline should be fitted without any errors.

rodrigo-arenas commented 1 year ago

Hi @RNarayan73 Unfortunately, this bug is related to DEAP and how it handles some objects when using multiprocessing, it creates some objects that can't be created again in another thread/process. I've been aware of this since a time ago and I already made the changes suggested by DEAP to handle it, but it doesn't work 100% of the time. A suggestion made by other users is to reinstall DEAP

Check these threads: DEAP bug 117 Stackoverflow

RNarayan73 commented 1 year ago

@rodrigo-arenas thanks for the links The threads above refer to the use of scoop, which I don't have installed. I am using standard joblib for the multi-processing. Will installing scoop help? Or is it for those developing on deap? Regards Narayan

rodrigo-arenas commented 1 year ago

I don't think that installing scoop would help since all the cross-validation and multiprocessing that are done inside scikit-learn uses joblib

RNarayan73 commented 1 year ago

Thanks for your help @rodrigo-arenas You've been one of the most responsive and committed developers I've had the pleasure of working with! Keep up the good work! Regards Narayan

rodrigo-arenas commented 1 year ago

I'm happy to help, let me know if something else raises