fitgrid.lmer(..., parallel=True) broke sometime in recent months in the course of unpinning the conda package dependencies in 0.4.X, updating for pymer4 0.7.X and switching from TravisCI to Actions.
The first signs were occasional unpredictable failures running fitgrid 0.5.0.dev0 parallel lmer in jupyter notebooks on CentOS 7. Sometimes worked sometimes didn't.
Then (fortunately) clear and consistent failures benchmarking parallel lmer on an Ubuntu 20.4 test machine.
Diagnosis ...
The issue appears related to multiprocessing.Pool() and/or Python package loading (what and when?) ... see the MREs where pass/fail depends on from pymer4 import Lmer even though Lmer() is not used there.
It slipped past the pytests. The test_models.py::test_lmer_correctness_parallel() passes b.c. the results are correct and it imports from pymer4 import Lmer to verify the fitgrid results.
A simple parallel lmer smoke test without the Lmer import in function or module scope fails. WTF.
First stab at an easy fix, so far so good ...
moved the Lmer import in lmer_single() here (models.py:175) up to module scope
it works for local conda built packages in Python 3.6, 3.7, 3.8 interpreter and pytest on Ubuntu 20.4
My next steps ...
add the parallel lmer smoke test to test_models.py
check the fix for user installs from conda cloud
check compatibility w/ jupyter.
MREs on local system, derived from the Actions CI
fitgrid at a53fbeee4
conda 4.9.2 and Python 3.6, 3.7, 3.8
# this succeeds and also with the import moved up to module scope
import fitgrid
def test_smoke_lmer_parallel():
from pymer4 import Lmer
epochs = fitgrid.generate(n_samples=2, n_channels=2)
RHS = 'continuous + (continuous | categorical)'
_ = fitgrid.lmer(epochs, RHS=RHS, parallel=True, n_cores=2)
# without from pymer4 import Lmer this fails
def test_smoke_lmer_parallel():
epochs = fitgrid.generate(n_samples=2, n_channels=2)
RHS = 'continuous + (continuous | categorical)'
_ = fitgrid.lmer(epochs, RHS=RHS, parallel=True, n_cores=2)
Different error messages make one issue look like two ...
py 3.6, rpy2 3.3.6
py 3.7, rpy2 3.4.2
Error: C stack usage 210161152500 is too close to the limit
Error: C stack usage 210161152548 is too close to the limit
Error: C stack usage 210161152020 is too close to the limit
Fatal error: unable to initialize the JIT
py 3.8, rpy2 3.4.2
Exception in thread Thread-7:
Traceback (most recent call last):
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
task = get()
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/pymer4/__init__.py", line 5, in <module>
from .models import Lmer, Lm, Lm2
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/pymer4/models/__init__.py", line 5, in <module>
from .Lmer import Lmer
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/pymer4/models/Lmer.py", line 9, in <module>
from rpy2.robjects.packages import importr
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 19, in <module>
from rpy2.robjects.robject import RObjectMixin, RObject
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/rpy2/robjects/robject.py", line 10, in <module>
rpy2.rinterface.initr_simple()
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/rpy2/rinterface.py", line 859, in initr_simple
_post_initr_setup()
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/site-packages/rpy2/rinterface.py", line 940, in _post_initr_setup
signal.signal(signal.SIGINT, _sigint_handler)
File "/home/turbach/miniconda3/envs/mkgpux_benchmark_py37/lib/python3.7/signal.py", line 47, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread
The problem ...
fitgrid.lmer(..., parallel=True) broke sometime in recent months in the course of unpinning the conda package dependencies in 0.4.X, updating for pymer4 0.7.X and switching from TravisCI to Actions.
The first signs were occasional unpredictable failures running fitgrid 0.5.0.dev0 parallel lmer in jupyter notebooks on CentOS 7. Sometimes worked sometimes didn't.
Then (fortunately) clear and consistent failures benchmarking parallel lmer on an Ubuntu 20.4 test machine.
Diagnosis ...
The issue appears related to
multiprocessing.Pool()
and/or Python package loading (what and when?) ... see the MREs where pass/fail depends onfrom pymer4 import Lmer
even thoughLmer()
is not used there.It slipped past the pytests. The
test_models.py::test_lmer_correctness_parallel()
passes b.c. the results are correct and it importsfrom pymer4 import Lmer
to verify the fitgrid results.A simple parallel lmer smoke test without the Lmer import in function or module scope fails. WTF.
First stab at an easy fix, so far so good ...
lmer_single()
here (models.py:175) up to module scopeMy next steps ...
test_models.py
MREs on local system, derived from the Actions CI
fitgrid at a53fbeee4 conda 4.9.2 and Python 3.6, 3.7, 3.8
conda build --python=3.8 -c defaults -c conda-forge -c ejolly conda
conda create -n test_env_py3.8 fitgrid -c local -c defaults -c conda-forge -c ejolly
conda activate test_env_py3.8
conda install pytest
Different error messages make one issue look like two ...
py 3.6, rpy2 3.3.6 py 3.7, rpy2 3.4.2
py 3.8, rpy2 3.4.2