Closed kraxli closed 1 year ago
Stale issue
I'm facing the same issue when executing within my Jupyter notebook. Is there a fix for the same?
My Code (Sample Version):
df = pd.DataFrame(
np.random.rand(1000000, 5),
columns=['a', 'b', 'c', 'd', 'e']
)
profile = ProfileReport(df, title='Pandas Profiling Report')
profile.to_notebook_iframe()
Error After Executing the Same:
Summarize dataset: 50%
9/18 [00:18<00:14, 1.63s/it, Calculate phi_k correlation]
exception calling callback for <Future at 0x7fe3084a3310 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 792, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
fn, *args, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1102, in submit
raise self._flags.broken
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 1044, in __call__
while self.dispatch_one_batch(iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
fn, *args, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1102, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
The exit codes of the workers are {EXIT(1), EXIT(1), EXIT(1)}
exception calling callback for <Future at 0x7fe3084a6950 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 792, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
AttributeError: 'NoneType' object has no attribute 'submit'
exception calling callback for <Future at 0x7fe3084a6850 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 792, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
AttributeError: 'NoneType' object has no attribute 'submit'
exception calling callback for <Future at 0x7fe3084a6750 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 792, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
AttributeError: 'NoneType' object has no attribute 'submit'
exception calling callback for <Future at 0x7fe3084a6fd0 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 792, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
AttributeError: 'NoneType' object has no attribute 'submit'
exception calling callback for <Future at 0x7fe3084a6610 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 792, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
AttributeError: 'NoneType' object has no attribute 'submit'
---------------------------------------------------------------------------
TerminatedWorkerError Traceback (most recent call last)
<ipython-input-5-3827eec15fb0> in <module>
1 #profile = ProfileReport(df._to_pandas(), title='Pandas Profiling Report')
2 profile = ProfileReport(df, title='Pandas Profiling Report')
----> 3 profile.to_notebook_iframe()
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/profile_report.py in to_notebook_iframe(self)
400 with warnings.catch_warnings():
401 warnings.simplefilter("ignore")
--> 402 display(get_notebook_iframe(self.config, self))
403
404 def to_widgets(self) -> None:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/widget/notebook.py in get_notebook_iframe(config, profile)
73 output = get_notebook_iframe_src(config, profile)
74 elif attribute == IframeAttribute.srcdoc:
---> 75 output = get_notebook_iframe_srcdoc(config, profile)
76 else:
77 raise ValueError(
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/widget/notebook.py in get_notebook_iframe_srcdoc(config, profile)
27 width = config.notebook.iframe.width
28 height = config.notebook.iframe.height
---> 29 src = html.escape(profile.to_html())
30
31 iframe = f'<iframe width="{width}" height="{height}" srcdoc="{src}" frameborder="0" allowfullscreen></iframe>'
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/profile_report.py in to_html(self)
370
371 """
--> 372 return self.html
373
374 def to_json(self) -> str:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/profile_report.py in html(self)
187 def html(self) -> str:
188 if self._html is None:
--> 189 self._html = self._render_html()
190 return self._html
191
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/profile_report.py in _render_html(self)
289 from pandas_profiling.report.presentation.flavours import HTMLReport
290
--> 291 report = self.report
292
293 with tqdm(
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/profile_report.py in report(self)
181 def report(self) -> Root:
182 if self._report is None:
--> 183 self._report = get_report_structure(self.config, self.description_set)
184 return self._report
185
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/profile_report.py in description_set(self)
168 self.summarizer,
169 self.typeset,
--> 170 self._sample,
171 )
172 return self._description_set
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/model/describe.py in describe(config, df, summarizer, typeset, sample)
98 pbar.set_postfix_str(f"Calculate {correlation_name} correlation")
99 correlations[correlation_name] = calculate_correlation(
--> 100 config, df, correlation_name, series_description
101 )
102 pbar.update()
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/model/correlations.py in calculate_correlation(config, df, correlation_name, summary)
183 try:
184 correlation = correlation_measures[correlation_name].compute(
--> 185 config, df, summary
186 )
187 except (ValueError, AssertionError, TypeError, DataError, IndexError) as e:
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas_profiling/model/correlations.py in compute(config, df, summary)
138 from phik import phik_matrix
139
--> 140 correlation = phik_matrix(df[selcols], interval_cols=list(intcols))
141
142 return correlation
/opt/conda/envs/rapids/lib/python3.7/site-packages/phik/phik.py in phik_matrix(df, interval_cols, bins, quantile, noise_correction, dropna, drop_underflow, drop_overflow, verbose)
217
218 return phik_from_rebinned_df(
--> 219 data_binned, noise_correction, dropna=dropna, drop_underflow=drop_underflow, drop_overflow=drop_overflow
220 )
221
/opt/conda/envs/rapids/lib/python3.7/site-packages/phik/phik.py in phik_from_rebinned_df(data_binned, noise_correction, dropna, drop_underflow, drop_overflow)
143 phik_list = Parallel(n_jobs=NCORES)(
144 delayed(_calc_phik)(co, data_binned[list(co)], noise_correction)
--> 145 for co in itertools.combinations_with_replacement(data_binned.columns.values, 2)
146 )
147
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
1042 self._iterating = self._original_iterator is not None
1043
-> 1044 while self.dispatch_one_batch(iterator):
1045 pass
1046
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
529 def apply_async(self, func, callback=None):
530 """Schedule a func to be run"""
--> 531 future = self._workers.submit(SafeFunction(func))
532 future.get = functools.partial(self.wrap_future_result, future)
533 if callback is not None:
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py in submit(self, fn, *args, **kwargs)
176 with self._submit_resize_lock:
177 return super(_ReusablePoolExecutor, self).submit(
--> 178 fn, *args, **kwargs)
179
180 def _resize(self, max_workers):
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py in submit(self, fn, *args, **kwargs)
1100 with self._flags.shutdown_lock:
1101 if self._flags.broken is not None:
-> 1102 raise self._flags.broken
1103 if self._flags.shutdown:
1104 raise ShutdownExecutorError(
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
The exit codes of the workers are {EXIT(1), EXIT(1), EXIT(1)}
Environment:
Jupyter Notebook
pip 21.1.1 from /opt/conda/envs/rapids/lib/python3.7/site-packages/pip (python 3.7)
Package Version
--------------------------------- ------------------------
absl-py 0.12.0
aiobotocore 1.3.0
aiohttp 3.7.4
aioitertools 0.7.1
altgraph 0.17
anyio 2.2.0
appdirs 1.4.4
argon2-cffi 20.1.0
astunparse 1.6.3
async-generator 1.10
async-timeout 3.0.1
attrs 20.3.0
backcall 0.2.0
backports.functools-lru-cache 1.6.4
blazingsql 0.19.0a0
bleach 3.3.0
bokeh 2.2.3
botocore 1.20.49
Bottleneck 1.3.2
brotlipy 0.7.0
bsql-engine 0.6
cached-property 1.5.2
cachetools 4.2.2
certifi 2020.12.5
cffi 1.14.5
chardet 4.0.0
click 7.1.2
click-plugins 1.1.1
cligj 0.7.1
cloudpickle 1.6.0
colorcet 2.0.6
confluent-kafka 1.5.0
cryptography 3.4.7
cudf 0.19.2
cudf-kafka 0.19.2
cugraph 0.19.0+0.gd72b90b0.dirty
cuml 0.19.0
cupy 8.6.0
cusignal 0.19.0
cuspatial 0.19.0
custreamz 0.19.2
cuxfilter 0.19.1
cycler 0.10.0
Cython 0.29.23
cytoolz 0.11.0
dask 2021.4.0
dask-cuda 0.19.0
dask-cudf 0.19.2
dask-glm 0.2.0
dask-labextension 4.0.1
dask-ml 1.8.0
datashader 0.11.1
datashape 0.5.4
decorator 4.4.2
defusedxml 0.7.1
deprecation 2.1.0
distributed 2021.4.0
entrypoints 0.3
fa2 0.3.5
fastavro 1.4.0
fastrlock 0.6
filterpy 1.4.5
Fiona 1.8.19
flatbuffers 1.12
fsspec 2021.4.0
future 0.18.2
gast 0.4.0
GDAL 3.2.2
geopandas 0.8.1
google-auth 1.30.0
google-auth-oauthlib 0.4.4
google-pasta 0.2.0
greenlet 1.0.0
grpcio 1.34.1
h5py 3.1.0
HeapDict 1.0.1
holoviews 1.14.3
htmlmin 0.1.12
idna 2.10
imagecodecs 2021.3.31
ImageHash 4.2.0
imageio 2.9.0
importlib-metadata 3.10.1
iniconfig 1.1.1
ipykernel 5.5.3
ipython 7.15.0
ipython-genutils 0.2.0
ipywidgets 7.6.3
jedi 0.17.2
Jinja2 2.11.3
jmespath 0.10.0
joblib 1.0.1
JPype1 1.2.1
json5 0.9.5
jsonschema 3.2.0
jupyter-client 6.1.12
jupyter-contrib-core 0.3.3
jupyter-contrib-nbextensions 0.5.1
jupyter-core 4.7.1
jupyter-highlight-selected-word 0.2.0
jupyter-latex-envs 1.4.6
jupyter-nbextensions-configurator 0.4.1
jupyter-packaging 0.9.2
jupyter-server 1.6.4
jupyter-server-proxy 3.0.2
jupyterlab 2.1.5
jupyterlab-nvdashboard 0.5.0
jupyterlab-pygments 0.1.2
jupyterlab-server 1.2.0
jupyterlab-widgets 1.0.0
keras-nightly 2.5.0.dev2021032900
Keras-Preprocessing 1.1.2
kiwisolver 1.3.1
llvmlite 0.36.0
locket 0.2.0
lxml 4.6.3
Markdown 3.3.4
MarkupSafe 1.1.1
matplotlib 3.4.2
missingno 0.4.2
mistune 0.8.4
modin 0.9.1
more-itertools 8.7.0
msgpack 1.0.2
multidict 5.1.0
multimethod 1.4
multipledispatch 0.6.0
munch 2.5.0
nbclient 0.5.3
nbconvert 6.0.7
nbformat 5.1.3
nest-asyncio 1.5.1
netifaces 0.10.9
networkx 2.5.1
notebook 6.4.0
numba 0.53.1
numpy 1.19.5
nvtx 0.2.3
oauthlib 3.1.0
olefile 0.46
opt-einsum 3.3.0
packaging 20.9
pandas 1.2.3
pandas-profiling 3.0.0
pandocfilters 1.4.2
panel 0.10.3
param 1.10.1
parso 0.7.1
partd 1.2.0
patsy 0.5.1
pexpect 4.8.0
phik 0.11.2
pickle5 0.0.11
pickleshare 0.7.5
Pillow 8.1.2
pip 21.1.1
pluggy 0.13.1
pooch 1.3.0
prometheus-client 0.10.1
prompt-toolkit 3.0.18
protobuf 3.15.8
psutil 5.8.0
ptyprocess 0.7.0
py 1.10.0
pyarrow 1.0.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pyct 0.4.6
pydantic 1.8.2
pydeck 0.5.0
pyee 7.0.4
Pygments 2.8.1
PyHive 0.6.3
pyinstaller 4.3
pyinstaller-hooks-contrib 2021.1
pynndescent 0.5.2
pynvml 8.0.4
pyOpenSSL 20.0.1
pyparsing 2.4.7
pypi 2.1
pyppeteer 0.2.2
pyproj 3.0.1
pyrsistent 0.17.3
PySocks 1.7.1
pytest 6.2.3
python-dateutil 2.8.1
pytz 2021.1
pyviz-comms 2.0.1
PyWavelets 1.1.1
PyYAML 5.4.1
pyzmq 22.0.3
requests 2.25.1
requests-oauthlib 1.3.0
rmm 0.19.0
rsa 4.7.2
Rtree 0.9.7
s3fs 2021.4.0
sasl 0.2.1
scikeras 0.3.3
scikit-image 0.18.1
scikit-learn 0.23.1
scipy 1.6.0
seaborn 0.11.1
Send2Trash 1.5.0
setuptools 56.2.0
Shapely 1.7.1
simpervisor 0.4
six 1.15.0
sniffio 1.2.0
sortedcontainers 2.3.0
SQLAlchemy 1.4.11
statsmodels 0.12.2
streamz 0.6.2
tangled-up-in-unicode 0.1.0
tblib 1.7.0
tensorboard 2.5.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow 2.5.0
tensorflow-estimator 2.5.0
termcolor 1.1.0
terminado 0.9.4
testpath 0.4.4
threadpoolctl 2.1.0
thrift 0.13.0
thrift-sasl 0.4.2
tifffile 2021.4.8
toml 0.10.2
tomlkit 0.7.0
toolz 0.11.1
tornado 6.1
tqdm 4.60.0
traitlets 5.0.5
treelite 1.1.0
treelite-runtime 1.1.0
typing-extensions 3.7.4.3
ucx-py 0.19.0
umap-learn 0.5.1
urllib3 1.26.4
visions 0.7.1
wcwidth 0.2.5
webencodings 0.5.1
websockets 8.1
Werkzeug 2.0.1
wheel 0.36.2
widgetsnbextension 3.5.1
wrapt 1.12.1
xarray 0.17.0
xgboost 1.4.0
yapf 0.31.0
yarl 1.6.3
zict 2.0.0
zipp 3.4.1
Note:
!{sys.executable} -m apt-get update && apt-get install -y build-essential
The issue occurs while parallelizing phik computation.
As a workaround, parallelization can be disabled by overwriting phik.phik_matrix method with a similar method where the default value of njobs is 1 instead of -1.
import phik
from typing import Tuple, Union, Optional
from phik.binning import auto_bin_data
from phik.phik import phik_from_rebinned_df
import numpy as np
# Same as phik.phik_matrix except for the default value of njobs
def phik_matrix_nJobsDefVal(
df: pd.DataFrame,
interval_cols: Optional[list] = None,
bins: Union[int, list, np.ndarray, dict] = 10,
quantile: bool = False,
noise_correction: bool = True,
dropna: bool = True,
drop_underflow: bool = True,
drop_overflow: bool = True,
verbose: bool = True,
njobs: int = 1,
) -> pd.DataFrame:
"""
Correlation matrix of bivariate gaussian derived from chi2-value
Chi2-value gets converted into correlation coefficient of bivariate gauss
with correlation value rho, assuming giving binning and number of records.
Correlation coefficient value is between 0 and 1.
Bivariate gaussian's range is set to [-5,5] by construction.
:param pd.DataFrame data_binned: input data
:param list interval_cols: column names of columns with interval variables.
:param bins: number of bins, or a list of bin edges (same for all columns), or a dictionary where per column the bins are specified. (default=10)\
E.g.: bins = {'mileage':5, 'driver_age':[18,25,35,45,55,65,125]}
:param quantile: when bins is an integer, uniform bins (False) or bins based on quantiles (True)
:param bool noise_correction: apply noise correction in phik calculation
:param bool dropna: remove NaN values with True
:param bool drop_underflow: do not take into account records in underflow bin when True (relevant when binning\
a numeric variable)
:param bool drop_overflow: do not take into account records in overflow bin when True (relevant when binning\
a numeric variable)
:param bool verbose: if False, do not print all interval columns that are guessed
:param int njobs: number of parallel jobs used for calculation of phik. default is -1. 1 uses no parallel jobs.
:return: phik correlation matrix
"""
data_binned, binning_dict = auto_bin_data(
df=df,
interval_cols=interval_cols,
bins=bins,
quantile=quantile,
dropna=dropna,
verbose=verbose,
)
return phik_from_rebinned_df(
data_binned,
noise_correction,
dropna=dropna,
drop_underflow=drop_underflow,
drop_overflow=drop_overflow,
njobs=njobs,
)
phik.phik_matrix = phik_matrix_nJobsDefVal
Just bumped onto this issue and the fix really works, thanks, @vishalsrao Shouldn't it be that we could pass an argument / configure in env - the number of jobs that is passed to phik_matrix rather than replacing the method fully (which will get out of sync at some point)
@kretes Thanks for the bump Tomasz. If anyone is interested, feel free to contribute a PR!
Hi,
We were not able to reproduce with the current version. My guess is that it is environment related.
The solution proposed above consists in deactivating the call to joblib.Parallel
in phik
library but does not solve the issue. You might want to report it to PhiK
directly: https://github.com/KaveIO/PhiK
Feel free to re-open if you have a way to reproduce consistently.
following up on #456
I am running into a TerminatedWorkerError.
Minimal example:
Returns the error:
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {EXIT(1)}
Environment:
thanks David