manuel-calzolari / sklearn-genetic

Genetic feature selection module for scikit-learn
https://sklearn-genetic.readthedocs.io
GNU Lesser General Public License v3.0
322 stars 77 forks source link

Threads close on AttributeError when run in ipython #17

Open nightvision04 opened 3 years ago

nightvision04 commented 3 years ago

In jupyter notebook, I can run the following without issue:

estimator = KNeighborsClassifier(n_neighbors=16)
selector = GeneticSelectionCV(estimator,
                                  cv=10,
                                  verbose=1,
                                  scoring="accuracy",
                                  max_features=3,
                                  n_population=1000,
                                  crossover_proba=0.5,
                                  mutation_proba=0.2,
                                  n_generations=40,
                                  crossover_independent_proba=0.5,
                                  mutation_independent_proba=0.05,
                                  tournament_size=3,
                                  n_gen_no_change=10,
                                  caching=True,
                                  n_jobs=4)
selector = selector.fit(X, y)

However, as soon as I run it for a second time in the same ipython cell, all of the deap threads raise an exception. I've included the stack trace below.

Essentially, the above code can't run in a loop in ipython. Are there some threads which are not properly closed due to the interaction between GIL and ipython?

AttributeError: Can't get attribute 'FitnessMulti' on <module 'deap.creator' from 'D:\\anaconda\\envs\\a\\lib\\site-packages\\deap\\creator.py'>
AttributeError: Can't get attribute 'FitnessMulti' on <module 'deap.creator' from 'D:\\anaconda\\envs\\a\\lib\\site-packages\\deap\\creator.py'>
Process SpawnPoolWorker-58:
Traceback (most recent call last):
  File "D:\anaconda\envs\a\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "D:\anaconda\envs\a\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "D:\anaconda\envs\a\lib\multiprocessing\pool.py", line 110, in worker
    task = get()
  File "D:\anaconda\envs\a\lib\multiprocessing\queues.py", line 354, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'FitnessMulti' on <module 'deap.creator' from 'D:\\anaconda\\envs\\a\\lib\\site-packages\\deap\\creator.py'>
Process SpawnPoolWorker-60:
Traceback (most recent call last):
  File "D:\anaconda\envs\a\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "D:\anaconda\envs\a\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "D:\anaconda\envs\a\lib\multiprocessing\pool.py", line 110, in worker
    task = get()
  File "D:\anaconda\envs\a\lib\multiprocessing\queues.py", line 354, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'FitnessMulti' on <module 'deap.creator' from 'D:\\anaconda\\envs\\a\\lib\\site-packages\\deap\\creator.py'>
manuel-calzolari commented 3 years ago

What versions of Python and sklearn-genetic are you using?

nightvision04 commented 3 years ago

Latest sklearn-genetic and python 3.6

On Fri., Dec. 11, 2020, 7:09 a.m. Manuel Calzolari, < notifications@github.com> wrote:

What versions of Python and sklearn-genetic are you using?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/manuel-calzolari/sklearn-genetic/issues/17#issuecomment-743212704, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO5A6Z3Y5NLES6SJW7ITQLSUIR2LANCNFSM4UTVHT6A .

manuel-calzolari commented 3 years ago

I think it may be related to DEAP's issue #268, but I'm not able to reproduce your specific issue.

I created the following Windows 10 based environment:

Cell 1:

import numpy as np
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from genetic_selection import GeneticSelectionCV

iris = datasets.load_iris()
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))
X = np.hstack((iris.data, E))
y = iris.target

Cell 2:

estimator = KNeighborsClassifier(n_neighbors=16)
selector = GeneticSelectionCV(estimator,
                                  cv=10,
                                  verbose=1,
                                  scoring="accuracy",
                                  max_features=3,
                                  n_population=1000,
                                  crossover_proba=0.5,
                                  mutation_proba=0.2,
                                  n_generations=40,
                                  crossover_independent_proba=0.5,
                                  mutation_independent_proba=0.05,
                                  tournament_size=3,
                                  n_gen_no_change=10,
                                  caching=True,
                                  n_jobs=4)
selector = selector.fit(X, y)

However, I don't get any crash when I run the second cell multiple times.

nightvision04 commented 3 years ago

I appreciate the thorough test! There must be something unique with my setup. Since then I've also tested in the python console and can't get it to work in a loop.

If updating helps ill let you know.

On Sat., Dec. 12, 2020, 11:29 a.m. Manuel Calzolari, < notifications@github.com> wrote:

I think it may be related to DEAP's issue #268, but I'm not able to reproduce your specific issue.

I created the following Windows 10 based environment:

  • Python 3.6.12
  • IPython 7.16.1
  • notebook 6.1.4
  • deap 1.3.1
  • sklearn-genetic 0.3.0

Cell 1:

import numpy as np from sklearn import datasets from sklearn.neighbors import KNeighborsClassifier from genetic_selection import GeneticSelectionCV

iris = datasets.load_iris() E = np.random.uniform(0, 0.1, size=(len(iris.data), 20)) X = np.hstack((iris.data, E)) y = iris.target

Cell 2:

estimator = KNeighborsClassifier(n_neighbors=16) selector = GeneticSelectionCV(estimator, cv=10, verbose=1, scoring="accuracy", max_features=3, n_population=1000, crossover_proba=0.5, mutation_proba=0.2, n_generations=40, crossover_independent_proba=0.5, mutation_independent_proba=0.05, tournament_size=3, n_gen_no_change=10, caching=True, n_jobs=4) selector = selector.fit(X, y)

However, I don't get any crash when I run the second cell multiple times.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/manuel-calzolari/sklearn-genetic/issues/17#issuecomment-743796748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO5A62XSXM7C25BXNXTBD3SUOZCDANCNFSM4UTVHT6A .