TerminatedWorkerError - Githubissues

arindam2007b commented 5 years ago

I am using the Pylift module in an AWS-EC2 linux instance with the code below and getting 2 different errors

up = TransformedOutcome(df_fil, col_treatment='Treatment',col_outcome='Outcome',col_policy='prop_scores', stratify=df_fil['Treatment'],sklearn_model = XGBClassifier) param_grid = {#'estimator': XGBClassifier(), 'param_grid': {'max_depth': range(1,8,1) 'learning_rate':[x/100 for x in range(1,12,4)], 'colsample_bytree':[x/10 for x in range(3,10,1)], 'min_child_weight':range(1,6,1), 'scale_pos_weight':[x/10 for x in range(12,18,1)], },'n_jobs' : -1} up.grid_search(**param_grid,cv=2)

Getting the following error while using the above code

`Fitting 2 folds for each of 7 candidates, totalling 14 fits [Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 14 | elapsed: 1.0min remaining: 3.7min [Parallel(n_jobs=-1)]: Done 8 out of 14 | elapsed: 1.0min remaining: 45.1s

TerminatedWorkerError Traceback (most recent call last)

in ----> 1 up.grid_search(**param_grid,cv=2) ~/anaconda3/lib/python3.7/site-packages/pylift/methods/base.py in grid_search(self, **kwargs) 337 self.grid_search_params.update(kwargs) 338 self.grid_search_ = GridSearchCV(**self.grid_search_params) --> 339 self.grid_search_.fit(self.x_train, self.transformed_y_train) 340 return self.grid_search_ 341 ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params) 685 return results 686 --> 687 self._run_search(evaluate_candidates) 688 689 # For multi-metric evaluation, store the best_index_, best_params_ and ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates) 1146 def _run_search(self, evaluate_candidates): 1147 """Search all candidates in param_grid""" -> 1148 evaluate_candidates(ParameterGrid(self.param_grid)) 1149 1150 ~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params) 664 for parameters, (train, test) 665 in product(candidate_params, --> 666 cv.split(X, y, groups))) 667 668 if len(out) < 1: ~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable) 932 933 with self._backend.retrieval_context(): --> 934 self.retrieve() 935 # Make sure that we get a last message telling us we are done 936 elapsed_time = time.time() - self._start_time ~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self) 831 try: 832 if getattr(self._backend, 'supports_timeout', False): --> 833 self._output.extend(job.get(timeout=self.timeout)) 834 else: 835 self._output.extend(job.get()) ~/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout) 519 AsyncResults.get from multiprocessing.""" 520 try: --> 521 return future.result(timeout=timeout) 522 except LokyTimeoutError: 523 raise TimeoutError() ~/anaconda3/lib/python3.7/concurrent/futures/_base.py in result(self, timeout) 430 raise CancelledError() 431 elif self._state == FINISHED: --> 432 return self.__get_result() 433 else: 434 raise TimeoutError() ~/anaconda3/lib/python3.7/concurrent/futures/_base.py in __get_result(self) 382 def __get_result(self): 383 if self._exception: --> 384 raise self._exception 385 else: 386 return self._result TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)}` When I remove n_jobs=-1 from the param_grid i.e with the code below `param_grid = {#'estimator': XGBClassifier(), 'param_grid': {'max_depth': range(1,8,1) 'learning_rate':[x/100 for x in range(1,12,4)], 'colsample_bytree':[x/10 for x in range(3,10,1)], 'min_child_weight':range(1,6,1), 'scale_pos_weight':[x/10 for x in range(12,18,1)], }}` `up.grid_search(**param_grid,cv=2)` I am getting the following error `terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc` I am using Jupyter notebook to run the above mentioned code, I know it can't be a memory error, cause I am having ample of memory i.e. 64 GB with 8 cores and using Python3.7 anaconda distribution

rsyi commented 5 years ago

First thing to check: the sklearn_model argument should be a Regressor object, not a Classifier. Try replacing XGBClassifier() with XGBRegressor()?

arindam2007b commented 5 years ago

Thanks this helped.

wayfair / pylift

TerminatedWorkerError #34

`Fitting 2 folds for each of 7 candidates, totalling 14 fits [Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 14 | elapsed: 1.0min remaining: 3.7min [Parallel(n_jobs=-1)]: Done 8 out of 14 | elapsed: 1.0min remaining: 45.1s