Closed stsmall closed 6 years ago
this looks like a version issue to me-- seems like sklearn has now changed the grid search function. Can you roll back your version of sklearn to a point where you are no longer getting the deprecation warning and see if it works? I think that might be v0.18
Hi @andrewkern, I downgraded to v0.18.0 and while the deprecation warning is still there, the errors are not. It will take a bit until I know for sure, but seems like it did the trick! thanks for the assist!
now we just have to update the code to deal with how scikit-learn broke it.... ugh
Asdf
On Thu, Aug 23, 2018 at 12:39 PM Andrew Kern notifications@github.com wrote:
now we just have to update the code to deal with how scikit-learn broke it.... ugh
— You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub https://github.com/kern-lab/FILET/issues/2#issuecomment-415505791, or mute the thread https://github.com/notifications/unsubscribe-auth/AnjwXimU1frNa4R2QGfMcionFkpySnPEks5uTui9gaJpZM4WJ1uJ .
-- Dan Schrider Assistant Professor Department of Genetics University of North Carolina at Chapel Hill email: drs@unc.edu phone: (919) 966-1764 website: https://www.schriderlab.org/
Hi @andrewkern, @dschride I followed the example downloaded with FILET, but seem to be running into an error during the training step, specifically that trainFiletClassifier.py stops with an error.
any help or suggestion are greatly appreciated! thanks, @stsmall
python 2.7 (anaconda version) scipy v1.0.1 numpy v1.13.3 sklearn v0.19.2
anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20. DeprecationWarning) training set size after balancing: 29940 Checking accuracy when distinguishing among all 3 classes Using extraTreesClassifier Traceback (most recent call last): File "trainFiletClassifier.py", line 81, in
grid_search.fit(X, y)
File "anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py", line 838, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py", line 574, in _fit
for parameters in parameter_iterable
File "anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 789, in call
self.retrieve()
File "anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 740, in retrieve
raise exception
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
Multiprocessing exception: ........................................................................... FILET/trainFiletClassifier.py in()
76 clf, mlType, paramGrid = ExtraTreesClassifier(n_estimators=100, random_state=0), "extraTreesClassifier", param_grid_forest
77
78 sys.stderr.write("Using %s\n" %(mlType))
79 grid_search = GridSearchCV(clf,param_grid=param_grid_forest,cv=10,n_jobs=10)
80 start = time()
---> 81 grid_search.fit(X, y)
82 sys.stderr.write("GridSearchCV took %.2f seconds for %d candidate parameter settings.\n"
83 % (time() - start, len(grid_search.gridscores)))
84 print "Results for %s" %(mlType)
85 report(grid_search.gridscores)
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py in fit(self=GridSearchCV(cv=10, error_score='raise', ...='2n_jobs', refit=True, scoring=None, verbose=0), X=array([[ 6.53900000e-03, 1.00000000e-06, 3....543860e+02, 1.00000000e+00, 1.00000000e+00]]), y=['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...]) 833 y : array-like, shape = [n_samples] or [n_samples, n_output], optional 834 Target relative to X for classification or regression; 835 None for unsupervised learning. 836 837 """ --> 838 return self._fit(X, y, ParameterGrid(self.param_grid)) self._fit = <bound method GridSearchCV._fit of GridSearchCV(...'2n_jobs', refit=True, scoring=None, verbose=0)> X = array([[ 6.53900000e-03, 1.00000000e-06, 3....543860e+02, 1.00000000e+00, 1.00000000e+00]]) y = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...] self.param_grid = {'bootstrap': [True, False], 'criterion': ['gini', 'entropy'], 'max_depth': [3, 10, None], 'max_features': [1, 3, 4, 22], 'min_samples_leaf': [1, 3, 10], 'min_samples_split': [1, 3, 10]} 839 840 841 class RandomizedSearchCV(BaseSearchCV): 842 """Randomized search on hyper parameters.
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/grid_search.py in _fit(self=GridSearchCV(cv=10, error_score='raise', ...='2*n_jobs', refit=True, scoring=None, verbose=0), X=array([[ 6.53900000e-03, 1.00000000e-06, 3....543860e+02, 1.00000000e+00, 1.00000000e+00]]), y=['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...], parameter_iterable=)
569 )(
570 delayed(_fit_and_score)(clone(baseestimator), X, y, self.scorer,
571 train, test, self.verbose, parameters,
572 self.fit_params, return_parameters=True,
573 error_score=self.error_score)
--> 574 for parameters in parameter_iterable
parameters = undefined
parameter_iterable =
575 for train, test in cv)
576
577 # Out is a list of triplet: score, estimator, n_test_samples
578 n_fits = len(out)
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in call(self=Parallel(n_jobs=10), iterable=<generator object>)
784 if pre_dispatch == "all" or n_jobs == 1:
785 # The iterable was consumed all at once by the above for loop.
786 # No need to wait for async callbacks to trigger to
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=10)>
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
792 self._print('Done %3i out of %3i | elapsed: %s finished',
793 (len(self._output), len(self._output),
Sub-process traceback:
ValueError Thu Aug 23 12:04:10 2018 PID: 51133Python 2.7.14: anaconda2/bin/python ........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in call(self=)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func =
args = (ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), memmap([[ 6.53900000e-03, 1.00000000e-06, 3...543860e+02, 1.00000000e+00, 1.00000000e+00]]), ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...], , array([ 998, 999, 1000, ..., 29937, 29938, 29939]), array([ 0, 1, 2, ..., 20955, 20956, 20957]), 0, {'bootstrap': True, 'criterion': 'gini', 'max_depth': 3, 'max_features': 1, 'min_samples_leaf': 1, 'min_samples_split': 1}, {})
kwargs = {'error_score': 'raise', 'return_parameters': True}
self.items = [(, (ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), memmap([[ 6.53900000e-03, 1.00000000e-06, 3...543860e+02, 1.00000000e+00, 1.00000000e+00]]), ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...], , array([ 998, 999, 1000, ..., 29937, 29938, 29939]), array([ 0, 1, 2, ..., 20955, 20956, 20957]), 0, {'bootstrap': True, 'criterion': 'gini', 'max_depth': 3, 'max_features': 1, 'min_samples_leaf': 1, 'min_samples_split': 1}, {}), {'error_score': 'raise', 'return_parameters': True})]
132
133 def len(self):
134 return self._size
135
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator=ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), X=memmap([[ 6.53900000e-03, 1.00000000e-06, 3...543860e+02, 1.00000000e+00, 1.00000000e+00]]), y=['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...], scorer=, train=array([ 998, 999, 1000, ..., 29937, 29938, 29939]), test=array([ 0, 1, 2, ..., 20955, 20956, 20957]), verbose=0, parameters={'bootstrap': True, 'criterion': 'gini', 'max_depth': 3, 'max_features': 1, 'min_samples_leaf': 1, 'min_samples_split': 1}, fit_params={}, return_train_score=False, return_parameters=True, error_score='raise')
1670
1671 try:
1672 if y_train is None:
1673 estimator.fit(X_train, fit_params)
1674 else:
-> 1675 estimator.fit(X_train, y_train, fit_params)
estimator.fit = <bound method ExtraTreesClassifier.fit of ExtraT...se, random_state=0, verbose=0, warm_start=False)>
X_train = memmap([[ 1.05620000e-02, 1.00000000e-06, 5...543860e+02, 1.00000000e+00, 1.00000000e+00]])
y_train = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ...]
fit_params = {}
1676
1677 except Exception as e:
1678 if error_score == 'raise':
1679 raise
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.py in fit(self=ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), X=array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32), y=array([[ 0.], [ 0.], [ 0.], ..., [ 2.], [ 2.], [ 2.]]), sample_weight=None) 323 trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, 324 backend="threading")( 325 delayed(_parallel_build_trees)( 326 t, self, X, y, sample_weight, i, len(trees), 327 verbose=self.verbose, class_weight=self.classweight) --> 328 for i, t in enumerate(trees)) i = 99 329 330 # Collect newly grown trees 331 self.estimators.extend(trees) 332
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in call(self=Parallel(n_jobs=1), iterable=<generator object>)
774 self.n_completed_tasks = 0
775 try:
776 # Only set self._iterating to True if at least a batch
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
self.dispatch_one_batch = <bound method Parallel.dispatch_one_batch of Parallel(n_jobs=1)>
iterator = <generator object >
780 self._iterating = True
781 else:
782 self._iterating = False
783
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self=Parallel(n_jobs=1), iterator=<generator object>)
620 tasks = BatchedCalls(itertools.islice(iterator, batch_size))
621 if len(tasks) == 0:
622 # No more tasks available in the iterator: tell caller to stop.
623 return False
624 else:
--> 625 self._dispatch(tasks)
self._dispatch = <bound method Parallel._dispatch of Parallel(n_jobs=1)>
tasks =
626 return True
627
628 def _print(self, msg, msg_args):
629 """Display the message on stout or stderr depending on verbosity"""
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self=Parallel(n_jobs=1), batch=)
583 self.n_dispatched_tasks += len(batch)
584 self.n_dispatched_batches += 1
585
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
job = undefined
self._backend.apply_async = <bound method SequentialBackend.apply_async of <...lib._parallel_backends.SequentialBackend object>>
batch =
cb =
589 self._jobs.append(job)
590
591 def dispatch_next(self):
592 """Dispatch more data for parallel processing
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self=, func=, callback=)
106 raise ValueError('n_jobs == 0 in Parallel has no meaning')
107 return 1
108
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
result = undefined
func =
112 if callback:
113 callback(result)
114 return result
115
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py in init(self=, batch=)
327
328 class ImmediateResult(object):
329 def init(self, batch):
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
self.results = undefined
batch =
333
334 def get(self):
335 return self.results
336
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in call(self=)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func =
args = (ExtraTreeClassifier(class_weight=None, criterion...dom_state=209652396,
splitter='random'), ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32), array([[ 0.],
[ 0.],
[ 0.],
...,
[ 2.],
[ 2.],
[ 2.]]), None, 0, 100)
kwargs = {'class_weight': None, 'verbose': 0}
self.items = [(, (ExtraTreeClassifier(class_weight=None, criterion...dom_state=209652396,
splitter='random'), ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32), array([[ 0.],
[ 0.],
[ 0.],
...,
[ 2.],
[ 2.],
[ 2.]]), None, 0, 100), {'class_weight': None, 'verbose': 0})]
132
133 def len(self):
134 return self._size
135
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.py in _parallel_build_trees(tree=ExtraTreeClassifier(class_weight=None, criterion...dom_state=209652396, splitter='random'), forest=ExtraTreesClassifier(bootstrap=True, class_weigh...lse, random_state=0, verbose=0, warm_start=False), X=array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32), y=array([[ 0.], [ 0.], [ 0.], ..., [ 2.], [ 2.], [ 2.]]), sample_weight=None, tree_idx=0, n_trees=100, verbose=0, class_weight=None) 116 warnings.simplefilter('ignore', DeprecationWarning) 117 curr_sample_weight = compute_sample_weight('auto', y, indices) 118 elif class_weight == 'balanced_subsample': 119 curr_sample_weight = compute_sample_weight('balanced', y, indices) 120 --> 121 tree.fit(X, y, sample_weight=curr_sample_weight, check_input=False) tree.fit = <bound method ExtraTreeClassifier.fit of ExtraTr...om_state=209652396, splitter='random')> X = array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32) y = array([[ 0.], [ 0.], [ 0.], ..., [ 2.], [ 2.], [ 2.]]) sample_weight = None curr_sample_weight = array([ 0., 0., 1., ..., 0., 1., 0.]) 122 else: 123 tree.fit(X, y, sample_weight=sample_weight, check_input=False) 124 125 return tree
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/tree/tree.py in fit(self=ExtraTreeClassifier(class_weight=None, criterion...dom_state=209652396, splitter='random'), X=array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32), y=array([[ 0.], [ 0.], [ 0.], ..., [ 2.], [ 2.], [ 2.]]), sample_weight=array([ 0., 0., 1., ..., 0., 1., 0.]), check_input=False, X_idx_sorted=None) 785 786 super(DecisionTreeClassifier, self).fit( 787 X, y, 788 sample_weight=sample_weight, 789 check_input=check_input, --> 790 X_idx_sorted=X_idx_sorted) X_idx_sorted = None 791 return self 792 793 def predict_proba(self, X, check_input=True): 794 """Predict class probabilities of the input samples X.
........................................................................... anaconda2/lib/python2.7/site-packages/sklearn/tree/tree.py in fit(self=ExtraTreeClassifier(class_weight=None, criterion...dom_state=209652396, splitter='random'), X=array([[ 1.05619999e-02, 9.99999997e-07, 5.....00000000e+00, 1.00000000e+00]], dtype=float32), y=array([[ 0.], [ 0.], [ 0.], ..., [ 2.], [ 2.], [ 2.]]), sample_weight=array([ 0., 0., 1., ..., 0., 1., 0.]), check_input=False, X_idx_sorted=None) 189 if isinstance(self.min_samples_split, (numbers.Integral, np.integer)): 190 if not 2 <= self.min_samples_split: 191 raise ValueError("min_samples_split must be an integer " 192 "greater than 1 or a float in (0.0, 1.0]; " 193 "got the integer %s" --> 194 % self.min_samples_split) self.min_samples_split = 1 195 min_samples_split = self.min_samples_split 196 else: # float 197 if not 0. < self.min_samples_split <= 1.: 198 raise ValueError("min_samples_split must be an integer "
ValueError: min_samples_split must be an integer greater than 1 or a float in (0.0, 1.0]; got the integer 1