smazzanti / mrmr

mRMR (minimum-Redundancy-Maximum-Relevance) for automatic feature selection at scale.
MIT License
531 stars 79 forks source link

A task has failed to un-serialize. #38

Open leummas opened 8 months ago

leummas commented 8 months ago

Running the example code Im receiving the following error:


_RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py", line 426, in _process_worker call_item = call_queue.get(block=True, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/feature_creation_env/lib/python3.12/multiprocessing/queues.py", line 122, in get return _ForkingPickler.loads(res) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/mrmr/init.py", line 1, in from . import bigquery File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/mrmr/bigquery.py", line 3, in from .main import mrmr_base, groupstats2fstat File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/mrmr/main.py", line 1, in import pandas as pd File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/pandas/init.py", line 46, in from pandas.core.api import ( File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/pandas/core/api.py", line 47, in from pandas.core.groupby import ( File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/pandas/core/groupby/init.py", line 1, in from pandas.core.groupby.generic import ( File "/opt/conda/envs/feature_creation_env/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 67, in from pandas.core.frame import DataFrame ... --> 754 raise self._result 755 return self._result 756 finally:

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

benhorvath commented 6 months ago

I am also getting a similar error, with both classification and regression functions. Looking at my activity monitor, I notice the user CPU shoots up to close to 100% right before the error triggers. Something isn't quite right with the parallel jobs code.