Closed rabyj closed 8 months ago
do the hangups also happen with pytest -s
yup. Traceback is not very helpful, since it's just the timeout one, but I'll still share for reference
```text +++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-6 (139742417913600) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results task = get() File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-5 (139742283679488) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/multiprocessing/pool.py", line 528, in _handle_tasks for taskseq, set_length in iter(taskqueue.get, None): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-4 (139739615721216) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/multiprocessing/pool.py", line 519, in _handle_workers cls._wait_for_updates(current_sentinels, change_notifier) File "/usr/lib/python3.8/multiprocessing/pool.py", line 499, in _wait_for_updates wait(sentinels, timeout=timeout) File "/usr/lib/python3.8/multiprocessing/connection.py", line 931, in wait ready = selector.select(timeout) File "/usr/lib/python3.8/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) +++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++ ```
```text
============================================== FAILURES ===============================================
_______________________________________ test_binary_classifier ________________________________________
test_dir = PosixPath('/tmp/pytest-of-rabyj/pytest-98/other_estimators1')
@pytest.mark.timeout(120)
def test_binary_classifier(test_dir: Path):
"""Test if binary models are properly supported."""
# Setting up environment variables
# import multiprocessing as mp # pylint: disable=import-outside-toplevel
# mp.set_start_method("spawn", force=True)
os.environ["MIN_CLASS_SIZE"] = "1"
os.environ["LABEL_LIST"] = '["female", "male"]'
epi_ml.core.estimators.NFOLD_PREDICT = 2
# Loading paths
datasource = EpiAtlasTreatmentTestData.default_test_data(
label_category="sex"
).epiatlas_dataset.datasource
current_dir = Path(__file__).parent.resolve()
hparams_file = current_dir.parent / "fixtures" / "other_estimators_hparams.json"
sys.argv = [
"other_estimators.py",
"--predict",
"sex",
f"{datasource.hdf5_file}",
f"{datasource.chromsize_file}",
f"{datasource.metadata_file}",
str(test_dir),
"--hyperparams",
f"{hparams_file}",
]
> main_module()
mains/other_estimators_test.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../epi_ml/other_estimators.py:226: in main
estimators.run_predictions(ea_handler, estimator, model_name, cli.logdir)
../epi_ml/core/estimators.py:400: in run_predictions
pool.starmap(func, items)
/usr/lib/python3.8/multiprocessing/pool.py:372: in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
/usr/lib/python3.8/multiprocessing/pool.py:765: in get
self.wait(timeout)
/usr/lib/python3.8/multiprocessing/pool.py:762: in wait
self._event.wait(timeout)
/usr/lib/python3.8/threading.py:558: in wait
signaled = self._cond.wait(timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self =
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Not sure either, but perhaps using a different timeout method might yield different results?
perhaps using a different timeout method might yield different results?
The timeout was to avoid the specific test hanging forever, so the test suite can finish. I actually installed a timeout module (pytest-timeout: 2.1.0
) BECAUSE it was hanging forever and causing my test suite to not be able to finish.
With this in mind, I honestly don't understand how the timeout method would be related to the bug :/
With this in mind, I honestly don't understand how the timeout method would be related to the bug :/
Out of insights then, sorry.
Same is happening for me with multiprocessing.pool.ThreadPool
, but the fix is not working in that case. I assume it's because "spawn" is not being called in this context. Any suggestions? Thanks!
@lorinkoz
Recently, I thought the problem had came back for me, but by using pytest -s
I noticed something else entirely was happening. So do try to debug using that.
Otherwise, make sure the following code happens before any multiprocessing
import, so make sure there's no way the multiprocessing
module is imported before your individual test (e.g. imported through another import)
import multiprocessing as mp
mp.set_start_method("spawn", force=True)
Those are the only two things that come to mind, hope it helps.
Thanks you this helped me
TL;DR: If you're experiencing a hanging issue during tests with a program using multiprocessing, try adding the following code at the beginning of your problematic test function:
def a_test(): import multiprocessing as mp mp.set_start_method("spawn", force=True) # [... rest of test ...]
Description
I've encountered an odd test hanging issue with a program that uses multiprocessing. The program completes:
- When run individually from the command line
- As part of a test function invoked alone via pytest
The program hangs indefinitely when run as part of a full suite of tests.
I wanted to at least document what I experienced even if I can't create a simple minimal example, since I found a workaround that could help others, and no other issue mentioning something like this. All issues are for single tests.
Environment
python:
3.8.10
pytest:7.2.0
pytest-mock:3.10.0
OS:Ubuntu 20.04.6 LTS
Details
The problem occurs with a test that runs a program to completion, which internally uses multiprocessing (via pool.starmap on a generator). When the test is run individually, it completes without problems within roughly 5 seconds. However, when the test is included in a full test suite and run with pytest, it hangs indefinitely (for over 6 minutes at least).
The issue persists even when using
concurrent.futures
in place ofmultiprocessing
(without apartial
as seen below).My workaround is using
mp.set_start_method("spawn", force=True)
at the start of the test.Fix in immediate context:
func = partial(run_prediction, estimator=estimator, name=name, logdir=logdir) items = enumerate(ea_handler.yield_split()) mp.set_start_method("spawn", force=True) # --> the fix, also works by being at the complete start of the test with mp.Pool(nb_workers) as pool: pool.starmap(func, items)
While the fix makes the test run slower (about 20 seconds), at least it doesn't hang indefinitely.
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Complete pip Environment
pip list output - click me
@rabyj thanks for sharing this. I had a similar problem using the pytest-console-scripts
library to test some scripts. In my case, it was happening on the GitlabCI, the test was stuck until the job reached the timeout. Using the multiprocessing spawn
allowed the test to work without issues.
EDIT: I found that if I make sure to explicitly close all files that are opened in a sub-process, I no longer need to set mp.set_start_method("spawn", force=True
.
I ran into this problem as well. However, mp.set_start_method("spawn", force=True)
causes pickle errors in my code ("object is not the same..."), which do not happen with the defaults. Are there alternative ways to get around the hanging sub-process issue that avoid using spawn
?
Closing this for now then.
For the record: pytest just runs each test function in sequence, if the test function is blocking somehow, probably the code being tested needs some adjustment -- my point is that pytest does not have a "known issue" with multiprocessing at all, it all depends on the user code running inside the tests.
If this issue comes up again, it is probably best to work on a MWE so a proper solution can be discussed.
for future visitors: I ran into this issue and the spawn
vs fork
change did not resolve it, but updating freezegun did!
@pierrebarbera freezegun was not part of my modules, seems like a package specific problem.
Then again, maybe MY problem was package specific but I never found out the culprit
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Then again, maybe MY problem was package specific but I never found out the culprit
I ran into a similar issue, but to add complication, I had both a repeatable hanging example and a repeatable non-issue example.
I'd need to explain a bunch of off-topic stuff here about my case, so, instead, I'll just keep this to some insights I found while debugging that are relevant here:
pytests
runner
and terminal
reference for frezzegun and pytest
spawn
nor fork
would fix a mocked call from join to check duration, not while frezzegun is altering time anyway ... how do you timeout a value that is frozen?spawn
instead of fork
to fixspawn
strategy plays safer with threading, fork
strategy does not (hint: remember the python GIL is interpreter scope when reading the docs, and it becomes clear if you use fork
you must solve all concurrency problems in all of your code and all relevant dependencies, to avoid deadlocking)spawn
? well, as @pierrebarbera observed: spawn
is slower, you a basically running Python for each Process
and the extra overhead does not likely (not being familiar with the exact implementation) does not scale up well for problems, but it might still scale wide.fork
strategy) running when the test hangs. :shrug: spawn
as the start method with full test suites via pytests (albeit I only tested python3.9.20-python3.13.0)multiprocessing.Process
instance's join()
's timeout value, it can hang without a context switch, (so only one process completes, and the other never starts) which means you should solve the race condition in your test code and include a timeout for join()
if speed is preferred over low-energy approach like sentinels and pools.:thinking: Hopefully this provides some of the insights you were looking for.
[!WARNING] This is comment, is only meant to provide some insights for debugging similar issues, rather than leaving the readers in the dark. This is not intended as an ultimate root cause. ymmv. :shrug:
TL;DR: If you're experiencing a hanging issue during tests with a program using multiprocessing, try adding the following code at the beginning of your problematic test function:
Description
I've encountered an odd test hanging issue with a program that uses multiprocessing. The program completes:
The program hangs indefinitely when run as part of a full suite of tests.
I wanted to at least document what I experienced even if I can't create a simple minimal example, since I found a workaround that could help others, and no other issue mentioning something like this. All issues are for single tests.
Environment
python:
3.8.10
pytest:7.2.0
pytest-mock:3.10.0
OS:Ubuntu 20.04.6 LTS
Details
The problem occurs with a test that runs a program to completion, which internally uses multiprocessing (via pool.starmap on a generator). When the test is run individually, it completes without problems within roughly 5 seconds. However, when the test is included in a full test suite and run with pytest, it hangs indefinitely (for over 6 minutes at least).
The issue persists even when using
concurrent.futures
in place ofmultiprocessing
(without apartial
as seen below).My workaround is using
mp.set_start_method("spawn", force=True)
at the start of the test.Fix in immediate context:
While the fix makes the test run slower (about 20 seconds), at least it doesn't hang indefinitely.
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Complete pip Environment
pip list output - click me
```text Package Version Location ----------------------------- ---------------- ------------------------------------------------------------------- absl-py 1.4.0 aiohttp 3.8.4 aiosignal 1.3.1 alabaster 0.7.13 alembic 1.11.1 anyio 3.7.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astroid 2.15.5 asttokens 2.2.1 async-timeout 4.0.2 attrs 23.1.0 autopage 0.5.1 Babel 2.12.1 backcall 0.2.0 beautifulsoup4 4.12.2 black 23.1.0 bleach 6.0.0 cachetools 5.3.1 certifi 2023.5.7 cffi 1.15.1 cfgv 3.3.1 charset-normalizer 3.1.0 click 8.1.3 cliff 4.3.0 cloudpickle 2.2.1 cmaes 0.9.1 cmd2 2.4.3 colorama 0.4.6 colorlog 6.7.0 comet-ml 3.21.0 comm 0.1.3 commonmark 0.9.1 configobj 5.0.8 cycler 0.11.0 debugpy 1.6.7 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.6 distlib 0.3.6 doc2dash 2.4.1 docutils 0.20.1 dulwich 0.21.5 EpiLaP 0.1 $HOME/Projects/sources/epi_ml/src/python everett 3.2.0 exceptiongroup 1.1.1 executing 1.2.0 fastjsonschema 2.17.1 filelock 3.12.0 fonttools 4.39.4 frozenlist 1.3.3 fsspec 2023.5.0 gitdb 4.0.10 GitPython 3.1.31 google-auth 2.19.0 google-auth-oauthlib 1.0.0 greenlet 2.0.2 grpcio 1.54.2 h5py 3.6.0 identify 2.5.24 idna 3.4 imagesize 1.4.1 imbalanced-learn 0.8.0 importlib-metadata 4.13.0 importlib-resources 5.12.0 iniconfig 2.0.0 ipykernel 6.23.1 ipython 8.12.2 ipython-genutils 0.2.0 ipywidgets 8.0.6 isort 5.12.0 jedi 0.18.2 Jinja2 3.1.2 joblib 1.2.0 jsonschema 4.17.3 jupyter 1.0.0 jupyter-client 8.2.0 jupyter-console 6.6.3 jupyter-core 5.3.0 jupyter-events 0.6.3 jupyter-server 2.6.0 jupyter-server-mathjax 0.2.6 jupyter-server-terminals 0.4.4 jupyterlab-pygments 0.2.2 jupyterlab-widgets 3.0.7 kaleido 0.2.1 kiwisolver 1.4.4 lazy-object-proxy 1.9.0 lightgbm 3.3.5 llvmlite 0.40.0 Mako 1.2.4 Markdown 3.4.3 MarkupSafe 2.1.2 matplotlib 3.5.3 matplotlib-inline 0.1.6 mccabe 0.7.0 mistune 2.0.5 multidict 6.0.4 mypy-extensions 1.0.0 nbclassic 0.5.3 nbclient 0.7.2 nbconvert 7.2.9 nbdime 3.1.0 nbformat 5.7.0 nest-asyncio 1.5.6 nodeenv 1.8.0 notebook 6.5.4 notebook-shim 0.2.3 numba 0.57.0 numpy 1.22.4 nvidia-ml-py3 7.352.0 oauthlib 3.2.2 optuna 3.0.6 overrides 7.3.1 packaging 23.1 pandas 1.4.4 pandocfilters 1.5.0 parso 0.8.3 pathspec 0.11.1 pbr 5.11.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.5.0 pip 20.0.2 pkg-resources 0.0.0 pkgutil-resolve-name 1.3.10 platformdirs 3.5.1 plotly 5.11.0 pluggy 1.0.0 pre-commit 3.0.4 prettytable 3.7.0 prometheus-client 0.17.0 prompt-toolkit 3.0.38 protobuf 3.20.1 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pyaml 23.5.9 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycparser 2.21 pyDeprecate 0.3.2 Pygments 2.15.1 pylint 2.16.4 pyparsing 3.0.9 pyperclip 1.8.2 pyrsistent 0.19.3 pytest 7.2.0 pytest-mock 3.10.0 pytest-timeout 2.1.0 python-dateutil 2.8.2 python-json-logger 2.0.7 pytorch-lightning 1.6.5 pytz 2023.3 PyYAML 6.0 pyzmq 25.1.0 qtconsole 5.4.3 QtPy 2.3.1 requests 2.31.0 requests-oauthlib 1.3.1 requests-toolbelt 1.0.0 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 11.2.0 rsa 4.9 scikit-learn 1.1.2 scikit-optimize 0.9.0+unofficial scipy 1.8.1 semantic-version 2.10.0 Send2Trash 1.8.2 setuptools 44.0.0 shap 0.41.0 six 1.16.0 slicer 0.0.7 smmap 5.0.0 sniffio 1.3.0 snowballstemmer 2.2.0 soupsieve 2.4.1 sphinx 7.0.1 sphinx-gallery 0.11.1 sphinxcontrib-applehelp 1.0.4 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 SQLAlchemy 2.0.15 stack-data 0.6.2 stevedore 5.1.0 tabulate 0.8.10 tenacity 8.2.2 tensorboard 2.13.0 tensorboard-data-server 0.7.0 terminado 0.17.1 threadpoolctl 3.1.0 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.11.8 torch 1.10.2 torchinfo 1.6.3 torchmetrics 0.7.2 tornado 6.3.2 tqdm 4.65.0 traitlets 5.9.0 typing-extensions 4.4.0 urllib3 2.0.2 virtualenv 20.23.0 wcwidth 0.2.6 webencodings 0.5.1 websocket-client 1.5.2 Werkzeug 2.3.4 wheel 0.40.0 widgetsnbextension 4.0.7 wrapt 1.15.0 wurlitzer 3.0.3 yarl 1.9.2 zipp 3.15.0 ```