scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.45k stars 25.27k forks source link

Grid Search CV hangs with n_jobs anything other than 1 #10533

Closed dhanush-ai1990 closed 6 years ago

dhanush-ai1990 commented 6 years ago

The below is the source code:

import sqlite3 import re import time import csv import numpy as np from sklearn.naive_bayes import MultinomialNB from sklearn.naive_bayes import BernoulliNB from sklearn.naive_bayes import GaussianNB from sklearn.utils import shuffle from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.learning_curve import learning_curve from sklearn.model_selection import GridSearchCV,RandomizedSearchCV from matplotlib import pyplot as pl from matplotlib.backends.backend_pdf import PdfPages from scipy import sparse from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import HashingVectorizer from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import make_blobs from sklearn import svm from sklearn.externals import joblib from sklearn.metrics import precision_recall_fscore_support from sklearn.metrics import f1_score from sklearn.metrics import accuracy_score import glob,os

MultiNomial

from xgboost import XGBClassifier import xgboost from tpot import TPOTClassifier

tpot_config = { 'xgboost.XGBClassifier': { 'n_estimators':[50,100,150,200], 'max_depth':[2,3,4,5,6,7,8,9], 'min_child_weight':[2,3,4,5], 'colsample_bytree':[0.2,0.6,0.8], 'colsample_bylevel':[0.2,0.6,0.8], 'objective':['multi:softprob']

}

}

param = { 'n_estimators':[200], 'max_depth':[3,5,4], 'min_child_weight':[4,5,3], 'colsample_bytree':[0.2,0.6,0.8], 'colsample_bylevel':[0.2,0.6,0.8]

}

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=2,max_eval_time_mins=30,random_state=42, verbosity=200, config_dict=tpot_config,n_jobs = 10)

clf=XGBClassifier(objective= 'multi:softprob')

gsearch1 = GridSearchCV(estimator = clf, param_grid = param,cv=2,verbose = 100,n_jobs = -1) gsearch1.fit(X_train, y_train)
print gsearch1.bestscore print gsearch1.bestparams

Omitted code relevant to data loading. It works fine for n_jobs =1

I am using Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 26 2016, 12:10:39). scikit-learn (0.19.1)

Because of this issue, TPOT also hangs as it uses SciKit learn GridSearchCV for internal operations.

chenhe95 commented 6 years ago

The only place I see n_jobs specified is in pipeline_optimizer, which I don't see used anywhere

dhanush-ai1990 commented 6 years ago

I edited the typo in the code above. Well the problem is not because of the code at all.

chenhe95 commented 6 years ago

Try running this code and add n_jobs to whatever you want, does it run?

http://scikit-learn.org/stable/auto_examples/plot_kernel_ridge_regression.html#sphx-glr-auto-examples-plot-kernel-ridge-regression-py

I was able to get it working using the same version of scikit-learn and Python and it didn't matter what n_jobs was

rth commented 6 years ago

It's a known issue with the multiprocessing module.

See FAQ and https://github.com/dmlc/xgboost/issues/2163 as well as https://github.com/scikit-learn/scikit-learn/issues/5115 and https://github.com/scikit-learn/scikit-learn/issues/6627.

bogdad commented 6 years ago

hey, was able to workaround it:

using sklearn with ‘hacked’ updated external joblib to master https://github.com/scikit-learn/scikit-learn/compare/master...bogdad:updating_external_joblib_to_1c57c18e9?expand=1 (currently it needs external cloudpickle which can be pip3 install cloudpickle)

and

%env LOKY_PICKLER='cloudpickle' 
import multiprocessing
multiprocessing.set_start_method('forkserver')

just setting multiprocessing.set_start_method('forkserver’) is not enough as default pickler in current sklearn is not able to marshal __main__ module lambas and stuff https://github.com/joblib/joblib/issues/263

jnothman commented 6 years ago

In the upcoming release, we will have a setting to use joblib master (environment variable SKLEARN_SITE_JOBLIB=1 at this stage). joblib master is not quite stable enough yet to be included in the release.

lejafar commented 6 years ago

@jnothman when is this release expected?

jnothman commented 6 years ago

Over the coming month or so...

jeyendranbalakrishnan commented 6 years ago

Any update about this upcoming release? Thanks!

bitmanlger commented 5 years ago

tested with 0.20rc1 and SKLEARN_SITE_JOBLIB=1 and it appears to work. :)

jnothman commented 5 years ago

It should be possible also without site joblib in 0.20! :)

AlphaRandom commented 5 years ago

Hi all, I've encountered same issue by using GridSearchCV with n_jobs = -1 . Below the message I got : Attempting to do parallel computing ' 'without protecting your import on a system that does ' 'not support forking. To use parallel-computing in a ' 'script, you must protect your main loop using "if ' "name == 'main'" '". Please see the joblib documentation on Parallel ' 'for more information' How can I modify my code to run parallel computation with SearchGridCV? Thanks

rth commented 5 years ago

@AlphaRandom The error message is pretty explicit, you need to put your code under,

if __name__ == '__main__':
   # parallel code here

see joblib docs for more information. Alternatively upgrading to scikit-learn 0.20 (which includes joblib 0.12) should make this unnecessary.

AlphaRandom commented 5 years ago

@AlphaRandom The error message is pretty explicit, you need to put your code under,

if __name__ == '__main__':
   # parallel code here

see joblib docs for more information. Alternatively upgrading to scikit-learn 0.20 (which includes joblib 0.12) should make this unnecessary.

Yes, many thanks. I figured out that after posting my question :) ....I only hope to speed up my computations without keeping on my pc for days :) ...At the moment in the task manager I can see cpu at 100% and a number of threads as expected

kimardenmiller commented 5 years ago

Using scikit-learn 0.20 and n_jobs=-1 I'm getting:

/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
exception calling callback for <Future at 0x11e76da20 state=finished raised BrokenProcessPool>
sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback: 
'''
Traceback (most recent call last):
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 393, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'create_model' on <module 'sklearn.externals.joblib.externals.loky.backend.popen_loky_posix' from '/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/backend/popen_loky_posix.py'>
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 375, in __call__
    self.parallel.dispatch_next()
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 797, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 825, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 782, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 506, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/reusable_executor.py", line 151, in submit
    fn, *args, **kwargs)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 1016, in submit
    raise self._flags.broken
sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback: 
'''
Traceback (most recent call last):
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 393, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'create_model' on <module 'sklearn.externals.joblib.externals.loky.backend.popen_loky_posix' from '/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/backend/popen_loky_posix.py'>
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/Utility/Optimize_Grid/optimizers.py", line 76, in <module>
    grid_result = grid.fit(X, Y)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 722, in fit
    self._run_search(evaluate_candidates)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 1191, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 711, in evaluate_candidates
    cv.split(X, y, groups)))
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 996, in __call__
    self.retrieve()
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 899, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 517, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 375, in __call__
    self.parallel.dispatch_next()
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 797, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 825, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 782, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 506, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/reusable_executor.py", line 151, in submit
    fn, *args, **kwargs)
  File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 1016, in submit
    raise self._flags.broken
sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Process finished with exit code 1
amueller commented 5 years ago

@kimardenmiller see #12250 and #12413

fuster-10 commented 4 years ago

Hello all,

Could someone please summarize what the workaround for this issue?

I am experiencing a similar situation as the one mentioned at the beginning of the post.

I saw that someone could workaround it with the information on this link: https://github.com/scikit-learn/scikit-learn/compare/master...bogdad:updating_external_joblib_to_1c57c18e9?expand=1

%env LOKY_PICKLER='cloudpickle' import multiprocessing multiprocessing.set_start_method('forkserver')

I am on a Windows platform. Could someone please confirm that this is the solution for a Windows platform?

Thanks in advance

Óscar

gaohaoyue commented 3 years ago

I resolved a similar issue by using model = xgb.XGBRegressor() instead of using model = XGBClassifier(random_state=126)

fuster-10 commented 3 years ago

Hello,

How did you manage to solve the issue?

Best

Óscar

gaohaoyue commented 3 years ago

@kunfuster , this may be unrelated but I resolved the issue by calling XGBRegressor instead of XGBClassifier. For I realized that my output data type is float instead of boolean or integer. Then I feed the model into the GridSearch and it worked fine without freezing. This is a good material you might be interested in checking out: https://www.datacamp.com/community/tutorials/xgboost-in-python. According to the article:

The next step is to instantiate an XGBoost regressor object by calling the XGBRegressor() class from the XGBoost library with the hyper-parameters passed as arguments. For classification problems, you would have used the XGBClassifier() class.

Hope it helps.

alexAmaguaya95 commented 2 years ago

I'm using lightGBM with Random GridSearch and have this issue:

OverflowError: cannot fit 'int' into an index-sized integer

I was using this code

cv_iterator = StratifiedKFold(n_splits=3, shuffle=True, random_state=None) search_space = { 'boosting_type': ['gbdt','dart','goss','rf'], 'num_leaves': list(range(2, 40, 1)) } rd_grid_search_clf = RandomizedSearchCV(lgb.LGBMClassifier(n_jobs=-1), search_space ,return_train_score=True, \ scoring='f1_macro',n_jobs=-1, refit=True, cv=cv_iterator, verbose=1, n_iter=50, error_score=np.nan, random_state=None)

rd_grid_search_clf.fit(X_train,y_train)