Closed dhanush-ai1990 closed 6 years ago
The only place I see n_jobs
specified is in pipeline_optimizer
, which I don't see used anywhere
I edited the typo in the code above. Well the problem is not because of the code at all.
Try running this code and add n_jobs
to whatever you want, does it run?
I was able to get it working using the same version of scikit-learn and Python and it didn't matter what n_jobs
was
It's a known issue with the multiprocessing module.
See FAQ and https://github.com/dmlc/xgboost/issues/2163 as well as https://github.com/scikit-learn/scikit-learn/issues/5115 and https://github.com/scikit-learn/scikit-learn/issues/6627.
hey, was able to workaround it:
using sklearn with ‘hacked’ updated external joblib to master
https://github.com/scikit-learn/scikit-learn/compare/master...bogdad:updating_external_joblib_to_1c57c18e9?expand=1
(currently it needs external cloudpickle
which can be pip3 install cloudpickle
)
and
%env LOKY_PICKLER='cloudpickle'
import multiprocessing
multiprocessing.set_start_method('forkserver')
just setting
multiprocessing.set_start_method('forkserver’)
is not enough as default pickler in current sklearn is not able to
marshal __main__
module lambas and stuff https://github.com/joblib/joblib/issues/263
In the upcoming release, we will have a setting to use joblib master (environment variable SKLEARN_SITE_JOBLIB=1 at this stage). joblib master is not quite stable enough yet to be included in the release.
@jnothman when is this release expected?
Over the coming month or so...
Any update about this upcoming release? Thanks!
tested with 0.20rc1 and SKLEARN_SITE_JOBLIB=1 and it appears to work. :)
It should be possible also without site joblib in 0.20! :)
Hi all, I've encountered same issue by using GridSearchCV with n_jobs = -1 . Below the message I got : Attempting to do parallel computing ' 'without protecting your import on a system that does ' 'not support forking. To use parallel-computing in a ' 'script, you must protect your main loop using "if ' "name == 'main'" '". Please see the joblib documentation on Parallel ' 'for more information' How can I modify my code to run parallel computation with SearchGridCV? Thanks
@AlphaRandom The error message is pretty explicit, you need to put your code under,
if __name__ == '__main__':
# parallel code here
see joblib docs for more information. Alternatively upgrading to scikit-learn 0.20 (which includes joblib 0.12) should make this unnecessary.
@AlphaRandom The error message is pretty explicit, you need to put your code under,
if __name__ == '__main__': # parallel code here
see joblib docs for more information. Alternatively upgrading to scikit-learn 0.20 (which includes joblib 0.12) should make this unnecessary.
Yes, many thanks. I figured out that after posting my question :) ....I only hope to speed up my computations without keeping on my pc for days :) ...At the moment in the task manager I can see cpu at 100% and a number of threads as expected
Using scikit-learn 0.20 and n_jobs=-1 I'm getting:
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
exception calling callback for <Future at 0x11e76da20 state=finished raised BrokenProcessPool>
sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 393, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'create_model' on <module 'sklearn.externals.joblib.externals.loky.backend.popen_loky_posix' from '/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/backend/popen_loky_posix.py'>
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 375, in __call__
self.parallel.dispatch_next()
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 797, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 825, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 782, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 506, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 1016, in submit
raise self._flags.broken
sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 393, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'create_model' on <module 'sklearn.externals.joblib.externals.loky.backend.popen_loky_posix' from '/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/backend/popen_loky_posix.py'>
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/Utility/Optimize_Grid/optimizers.py", line 76, in <module>
grid_result = grid.fit(X, Y)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 722, in fit
self._run_search(evaluate_candidates)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 1191, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 711, in evaluate_candidates
cv.split(X, y, groups)))
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 996, in __call__
self.retrieve()
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 899, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 517, in wrap_future_result
return future.result(timeout=timeout)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 375, in __call__
self.parallel.dispatch_next()
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 797, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 825, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 782, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 506, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "/Users/kimardenmiller/Dropbox/PyCharm/Investing/Deep_Qlearning_MktCapGVA/venv/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 1016, in submit
raise self._flags.broken
sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Process finished with exit code 1
@kimardenmiller see #12250 and #12413
Hello all,
Could someone please summarize what the workaround for this issue?
I am experiencing a similar situation as the one mentioned at the beginning of the post.
I saw that someone could workaround it with the information on this link: https://github.com/scikit-learn/scikit-learn/compare/master...bogdad:updating_external_joblib_to_1c57c18e9?expand=1
%env LOKY_PICKLER='cloudpickle' import multiprocessing multiprocessing.set_start_method('forkserver')
I am on a Windows platform. Could someone please confirm that this is the solution for a Windows platform?
Thanks in advance
Óscar
I resolved a similar issue by using model = xgb.XGBRegressor()
instead of using model = XGBClassifier(random_state=126)
Hello,
How did you manage to solve the issue?
Best
Óscar
@kunfuster , this may be unrelated but I resolved the issue by calling XGBRegressor
instead of XGBClassifier
. For I realized that my output data type is float instead of boolean or integer. Then I feed the model into the GridSearch and it worked fine without freezing. This is a good material you might be interested in checking out: https://www.datacamp.com/community/tutorials/xgboost-in-python.
According to the article:
The next step is to instantiate an XGBoost regressor object by calling the XGBRegressor() class from the XGBoost library with the hyper-parameters passed as arguments. For classification problems, you would have used the XGBClassifier() class.
Hope it helps.
I'm using lightGBM with Random GridSearch and have this issue:
OverflowError: cannot fit 'int' into an index-sized integer
I was using this code
cv_iterator = StratifiedKFold(n_splits=3, shuffle=True, random_state=None) search_space = { 'boosting_type': ['gbdt','dart','goss','rf'], 'num_leaves': list(range(2, 40, 1)) } rd_grid_search_clf = RandomizedSearchCV(lgb.LGBMClassifier(n_jobs=-1), search_space ,return_train_score=True, \ scoring='f1_macro',n_jobs=-1, refit=True, cv=cv_iterator, verbose=1, n_iter=50, error_score=np.nan, random_state=None)
rd_grid_search_clf.fit(X_train,y_train)
The below is the source code:
import sqlite3 import re import time import csv import numpy as np from sklearn.naive_bayes import MultinomialNB from sklearn.naive_bayes import BernoulliNB from sklearn.naive_bayes import GaussianNB from sklearn.utils import shuffle from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.learning_curve import learning_curve from sklearn.model_selection import GridSearchCV,RandomizedSearchCV from matplotlib import pyplot as pl from matplotlib.backends.backend_pdf import PdfPages from scipy import sparse from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import HashingVectorizer from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import make_blobs from sklearn import svm from sklearn.externals import joblib from sklearn.metrics import precision_recall_fscore_support from sklearn.metrics import f1_score from sklearn.metrics import accuracy_score import glob,os
MultiNomial
from xgboost import XGBClassifier import xgboost from tpot import TPOTClassifier
tpot_config = { 'xgboost.XGBClassifier': { 'n_estimators':[50,100,150,200], 'max_depth':[2,3,4,5,6,7,8,9], 'min_child_weight':[2,3,4,5], 'colsample_bytree':[0.2,0.6,0.8], 'colsample_bylevel':[0.2,0.6,0.8], 'objective':['multi:softprob']
}
param = { 'n_estimators':[200], 'max_depth':[3,5,4], 'min_child_weight':[4,5,3], 'colsample_bytree':[0.2,0.6,0.8], 'colsample_bylevel':[0.2,0.6,0.8]
}
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=2,max_eval_time_mins=30,random_state=42, verbosity=200, config_dict=tpot_config,n_jobs = 10)
clf=XGBClassifier(objective= 'multi:softprob')
gsearch1 = GridSearchCV(estimator = clf, param_grid = param,cv=2,verbose = 100,n_jobs = -1) gsearch1.fit(X_train, y_train)
print gsearch1.bestscore print gsearch1.bestparams
Omitted code relevant to data loading. It works fine for n_jobs =1
I am using Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 26 2016, 12:10:39). scikit-learn (0.19.1)
Because of this issue, TPOT also hangs as it uses SciKit learn GridSearchCV for internal operations.