Open jenyacazacu opened 6 years ago
its really not clear whats happening there, its just clear that python itself gets seg-faulted, which is quite a feat and a real pain to debug
ok, thx, do you have any suggestions as to where I can start with debugging?
unfortunately not, im not familiar with the surrounding libs in use
[gw4] [ 96%] PASSED tests/warehouse/test_warehouse_supplier_refunds.py::TestDeleteSupplierRefund::test_delete_refund_after_delete_cell 10:32:56 [gw5] [ 96%] PASSED tests/warehouse/test_warehouse_supplier_refunds.py::TestDeleteSupplierRefund::test_delete_refund_after_good_has_become_serial 10:32:56 [gw3] node down: Not properly terminated 10:32:56 [gw3] [ 96%] FAILED tests/warehouse/test_warehouse_residue.py::TestResidueControlValidation::test_close_residue_rule_via_mask 10:32:56
10:32:56 replacing crashed worker gw3 10:32:56
10:32:56 [gw8] linux Python 3.7.6 cwd: /opt/buildagent/work/61408fc2f9cf70c9 10:32:57
10:32:57 [gw8] Python 3.7.6 (default, Mar 17 2020, 13:08:12) -- [GCC 7.5.0] 10:33:00
10:33:10 tests/warehouse/test_warehouse_residue.py::TestResidueControlValidation::test_field_validation_positive 10:33:10 [gw2] node down: Not properly terminated 10:33:10 [gw2] [ 97%] FAILED tests/warehouse/test_warehouse_residue.py::TestResidueControlValidation::test_field_validation_negative[Abc] 10:33:10
10:33:10 replacing crashed worker gw2 10:33:10
10:33:10 [gw9] linux Python 3.7.6 cwd: /opt/buildagent/work/61408fc2f9cf70c9 10:33:10
10:33:10 [gw9] Python 3.7.6 (default, Mar 17 2020, 13:08:12) -- [GCC 7.5.0] 10:33:13
10:33:16 tests/warehouse/test_warehouse_residue.py::TestResidueControlValidation::test_field_validation_negative[ ] 10:33:16 [gw7] node down: Not properly terminated 10:33:16 [gw7] [ 97%] FAILED tests/warehouse/test_warehouse_residue.py::TestCreateAndDeleteResidueControl::test_check_table_icon 10:33:16
10:33:16 replacing crashed worker gw7 10:33:16
10:33:16 [gw10] linux Python 3.7.6 cwd: /opt/buildagent/work/61408fc2f9cf70c9 10:33:17
10:33:17 [gw10] Python 3.7.6 (default, Mar 17 2020, 13:08:12) -- [GCC 7.5.0] 10:33:20
10:34:01 tests/warehouse/test_warehouse_residue.py::TestResidueControlValidation::test_field_validation_negative[0] 10:34:01 [gw0] node down: Not properly terminated 10:34:01 [gw0] [ 97%] FAILED tests/warehouse/test_warehouse_residue.py::TestResidueControlValidation::test_close_residue_rule_modal 10:34:01
10:34:01 replacing crashed worker gw0 10:34:01
10:34:01 [gw11] linux Python 3.7.6 cwd: /opt/buildagent/work/61408fc2f9cf70c9 10:34:02
10:34:02 [gw11] Python 3.7.6 (default, Mar 17 2020, 13:08:12) -- [GCC 7.5.0]
I have seen this type of error occur when combining pytest-xdist
with other libraries that deal with multithreaded or multi-processing code. One common pain point is the Annoy library from Spotify, especially older versions. It does many multi-threaded actions without end user control of the threading, and there are just extremely mystrerious and un-debuggable ways that pytest-xdist causes some bad interaction with threading to lead to a segfault and a crashed worker. Worse this can be different for different OSes or Python versions.
I am debugging an example right now where on Mac OS with Python 3.6.9, everything is fine and pytest-xdist runs my test suite (with parallel workers) without issue. If I take the same code and put it in an equivalent Ubuntu Docker image and run the tests, I get a segfault from execnet and crashed workers. But if I run the tests serially in the Docker image, all tests pass.
These issues are virtually impossible to boil down to simplified reproducible examples as well, since if I knew what all the factors were that are required to minimally reproduce it, that would probably be the solution to debugging it. It's very, very hard to use pytest-xdist in these situations.
I'm seeing a similar error (Python segfault + "node down: Not properly terminated"). It seems to fail consistently in my Ubuntu CI runs (these logs will be unavailable soon) and on my MacBook, but only on CPython 3.11 (3.11.0-beta.3 on Ubuntu and 3.11.0b3+ [f9d0240] on macOS).
nox > python run_tests.py -m 'not longrunning' --script-launch-mode=subprocess -s tests_regression
I have seen this type of error occur when combining pytest-xdist with other libraries that deal with multithreaded or multi-processing code.
All tests perform image comparison in parallel (diffpdf.py), but the crash does not occur in this stage of the test but rather during rinohtype's rendering which is single-threaded.
It's always _testrst[png] that segfaults. When running without pytest-xdist (-n 0
), all is well. It is possible to reproduce the issue running only the _testrst[png] test case:
git clone https://github.com/brechtm/rinohtype.git
cd rinohtype
git checkout bd4b4157
poetry install
poetry run nox -r -s "regression-3.11(wheel)" -- -k png
UPDATE: Running the tests not through nox, the issue doesn't occur!
.nox/regression-3-11-wheel/bin/python run_tests.py -m 'not longrunning' --script-launch-mode=subprocess -k png tests_regression
~Similar problem, also with xdist: https://github.com/deltachat/deltachat-core-rust/actions/runs/4194093763/jobs/7271810435~
EDIT: this turned out to be a bug in our code, not pytest: https://github.com/deltachat/deltachat-core-rust/pull/4153
Confronted with the same problem in PyPOTS testing here https://github.com/WenjieDu/PyPOTS/actions/runs/4577128474/jobs/8082161160
No solution, but for others that every come across this issue, in my case the following happened.
When running pytest tests in parallel with xdist, my tests crash with:
[gw5] node down: Not properly terminated
[gw5] FAILED tests/test.py::test_with_botorch
replacing crashed worker gw5
This crash happens only when I have:
class A:
def x(self):
import botorch # only used in this function
but not when
import botorch
class A:
def x(self):
Also cannot reproduce it on MacOS but only in the CI on DevOps with ubuntu-latest.
@basnijholt
Just curious, do you still detecting the same construction?
did you played with number of threads etc?
We recently moved to Python 3.4 and when running our tests with pytest-xdist installed and in parallel with (-n) sometimes the tests fail with the below error. The traceback is from using faulthandler. The tests consistently fail during tests that use celery tasks and that have the format:
@override_settings(task_always_eager=True) def test_example(self):
Packages used: execnet-1.5.0 pytest-3.0.6 xdist-1.15.0
Here is the error log:
platform linux -- Python 3.4.5, pytest-3.0.6, py-1.4.32, pluggy-0.4.0 Django settings: gameserver.settings.test (from environment variable) rootdir: /home/jcazacu/Repos/main_repo/ares-game-server, inifile: tox.ini plugins: xdist-1.15.0, faulthandler-1.4.1, django-3.1.2, cov-2.5.1, celery-4.0.2 gw0 [535] / gw1 [535] / gw2 [535] / gw3 [535] scheduling tests via LoadScheduling .....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Fatal Python error: Segmentation fault
Thread 0x00007fbd16d3d700 (most recent call first): File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 386 in read File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 418 in from_io File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 954 in _thread_receiver File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 213 in run File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 277 in _perform_spawn
Current thread 0x00007fbd1eed2740 (most recent call first): File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/db/backends/sqlite3/base.py", line 335 in execute File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/db/backends/utils.py", line 62 in execute File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/db/backends/base/base.py", line 288 in _savepoint_rollback File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/db/backends/base/base.py", line 328 in savepoint_rollback File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/db/transaction.py", line 243 in exit File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/test/testcases.py", line 1004 in _rollback_atomics File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/test/testcases.py", line 1066 in _fixture_teardown File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/test/testcases.py", line 908 in _post_teardown File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/channels/tests/base.py", line 57 in _post_teardown File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/django/test/testcases.py", line 216 in call File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/unittest.py", line 157 in runtest File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 104 in pytest_runtest_call File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 614 in execute File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 265 in init File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 248 in _wrapped_call File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 613 in execute File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 334 in
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 339 in _hookexec
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 745 in call
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 151 in
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 163 in init
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 151 in call_runtest_hook
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 133 in call_and_report
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 79 in runtestprotocol
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/runner.py", line 66 in pytest_runtest_protocol
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 614 in execute
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 265 in init
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 248 in _wrapped_call
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 613 in execute
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 265 in init
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 248 in _wrapped_call
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 613 in execute
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 334 in
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 339 in _hookexec
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 745 in call
File "", line 77 in run_tests
File "", line 61 in pytest_runtestloop
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 614 in execute
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 334 in
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 339 in _hookexec
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 745 in call
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/main.py", line 133 in _main
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/main.py", line 98 in wrap_session
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/main.py", line 127 in pytest_cmdline_main
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 614 in execute
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 334 in
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 339 in _hookexec
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/_pytest/vendored_packages/pluggy.py", line 745 in call
File "", line 159 in
File "", line 1 in do_exec
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 1072 in executetask
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 213 in run
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 277 in _perform_spawn
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 259 in integrate_as_primary_thread
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 1047 in serve
File "/home/jcazacu/Repos/main_repo/ares-game-server/.tox/test34/lib/python3.4/site-packages/execnet/gateway_base.py", line 1534 in serve
File "", line 8 in
File "", line 1 in
......[gw1] node down: Not properly terminated
fReplacing crashed slave gw1
Any ideas?
Currently we just moved the task tests to run serially since the failures are so unpredictable.