Open telles-simbiose opened 7 years ago
those processes are indeed xdist slaves, however with the amount of information you provided we are entirely unable to help
I don't know if it's related but I've seen parallel runs with xdist hang for a long time on my Jenkins with the most recent output being scheduling tests via LoadScheduling
@RonnyPfannschmidt , what kind of information should I provide? It happened only once, I was not able to reproduce it a second time
@telles-simbiose all involved packages and versions, also it would help if we could take a look at the testsuite
deadlocks triggered by rare race conditions are not uncommon in distributed systems, and xdist running slaves is a distributed system
We commonly (about 1/20 runs) get this issue on jenkins and local machines.
15:56:10 ============================= test session starts ==============================
15:56:10 platform linux -- Python 3.6.4, pytest-3.3.2, py-1.5.2, pluggy-0.6.0 -- /****_env/bin/python3.6
15:56:10 cachedir: .cache
15:56:10 rootdir: /opt/****, inifile: setup.cfg
15:56:10 plugins: xdist-1.22.0, forked-0.2, cov-2.4.0
15:56:10 gw0 I / gw1 I / gw2 I / gw3 I / gw4 I / gw5 I / gw6 I / gw7 I / gw8 I / gw9 I / gw10 I / gw11 I / gw12 I / gw13 I / gw14 I / gw15 I / gw16 I / gw17 I / gw18 I / gw19 I / gw20 I / gw21 I / gw22 I / gw23 I / gw24 I / gw25 I / gw26 I / gw27 I / gw28 I / gw29 I / gw30 I / gw31 I / gw32 I / gw33 I / gw34 I / gw35 I / gw36 I / gw37 I / gw38 I / gw39 I
15:56:10
[gw0] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw1] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw2] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw3] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw4] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw5] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw6] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw7] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw8] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw9] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw10] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw11] linux Python 3.6.4 cwd: /opt/****
15:56:11
[gw12] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw13] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw14] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw15] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw16] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw17] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw18] linux Python 3.6.4 cwd: /opt/****
15:56:12
[gw19] linux Python 3.6.4 cwd: /opt/****
15:56:13
[gw20] linux Python 3.6.4 cwd: /opt/****
15:56:13
[gw21] linux Python 3.6.4 cwd: /opt/****
15:56:13
[gw22] linux Python 3.6.4 cwd: /opt/****
15:56:13
[gw23] linux Python 3.6.4 cwd: /opt/****
15:56:14
[gw24] linux Python 3.6.4 cwd: /opt/****
15:56:14
[gw25] linux Python 3.6.4 cwd: /opt/****
15:56:14
[gw26] linux Python 3.6.4 cwd: /opt/****
15:56:14
[gw27] linux Python 3.6.4 cwd: /opt/****
15:56:15
[gw28] linux Python 3.6.4 cwd: /opt/****
15:56:15
[gw29] linux Python 3.6.4 cwd: /opt/****
15:56:15
[gw30] linux Python 3.6.4 cwd: /opt/****
15:56:15
[gw31] linux Python 3.6.4 cwd: /opt/****
15:56:15
[gw32] linux Python 3.6.4 cwd: /opt/****
15:56:16
[gw33] linux Python 3.6.4 cwd: /opt/****
15:56:17
[gw34] linux Python 3.6.4 cwd: /opt/****
15:56:17
[gw35] linux Python 3.6.4 cwd: /opt/****
15:56:17
[gw36] linux Python 3.6.4 cwd: /opt/****
15:56:18
[gw37] linux Python 3.6.4 cwd: /opt/****
15:56:18
[gw38] linux Python 3.6.4 cwd: /opt/****
15:56:19
[gw39] linux Python 3.6.4 cwd: /opt/****
15:56:19
[gw0] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw1] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw2] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw3] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw4] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw5] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw6] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw8] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw7] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw9] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw10] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw11] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw12] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw13] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw14] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw15] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw17] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw16] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw18] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw19] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw20] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw21] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw23] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw22] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw24] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw25] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw27] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw26] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw28] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw29] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw30] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw31] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw32] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw33] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw34] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw35] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw36] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw37] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:19
[gw38] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
15:56:20
[gw39] Python 3.6.4 (default, Mar 9 2018, 23:26:36) -- [GCC 5.4.0 20160609]
17:09:46 Cancelling nested steps due to timeout
17:09:46 Sending interrupt signal to process
17:09:49 Makefile:240: recipe for target 'test_integration2' failed
17:09:49 make: *** [test_integration2] Terminated
17:09:49 Terminated
[Pipeline] }
[Pipeline] // timeout
[Pipeline] sh
17:09:49 [dai-tests-2gpu_dev-NUMSKB2Q3XX56UBUAOGHD33FZR7UOQYGSSWMHKM3RXLMS6ITB5GQ] Running shell script
17:09:49 + echo Jenkins job ABORTED due to exception, non-zero errorcode, or Jenkins timeout: script returned exit code 143
[Pipeline] sh
17:09:49 [tests-2gpu_dev-NUMSKB2Q3XX56UBUAOGHD33FZR7UOQYGSSWMHKM3RXLMS6ITB5GQ] Running shell script
17:09:49 + mkdir -p build/test-reports
17:09:49 + echo <?xml version="1.0" encoding="utf-8"?>
17:09:49 <testsuite errors="0" failures="1" name="pytest" skips="0" tests="0" time="0">
17:09:49 <testcase classname="tests" file="tests" name="tests" time="0"><failure message="JOB WAS KILLED BECAUSE Exception">script returned exit code 143</failure></testcase>
17:09:49 </testsuite>
enkins 22406 0.0 0.0 1148 4 ? Ss 01:30 0:00 | \_ /dev/init -- ./run.sh /opt/h2oai make test
jenkins 2536 0.0 0.0 19716 3156 ? S 01:32 0:00 | \_ /bin/bash -c pytest --color=yes --durations=10 -s -v --fulltrace --full-trace --junit-xml=build/test-reports/unit-test_test_models.xml -n auto tests/test_models 2> ./tmp/test_models.29966_2018.03.05-09:32:12
jenkins 2537 0.0 0.0 3063628 29844 ? Sl 01:32 0:01 | \_ /h2oai_env/bin/python3.6 /env/bin/pytest --color=yes --durations=10 -s -v --fulltrace --full-trace --junit-xml=build/test-reports/unit-test_test_models.xml -n auto tests/test_models
jenkins 2541 0.2 0.1 3606224 161340 ? Sl 01:32 0:09 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2544 0.2 0.1 3604924 159784 ? Sl 01:32 0:10 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2547 0.1 0.1 3604840 159948 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2551 0.1 0.1 3604924 159984 ? Sl 01:32 0:09 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2555 0.1 0.1 3604864 160076 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2559 0.1 0.1 3604928 159904 ? Sl 01:32 0:09 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2567 0.1 0.1 3604928 159876 ? Sl 01:32 0:09 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2574 0.1 0.1 3604928 160152 ? Sl 01:32 0:09 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2580 0.2 0.1 3604852 160100 ? Sl 01:32 0:10 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2636 0.1 0.1 3604928 159868 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2669 0.1 0.1 3604832 160048 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2735 0.1 0.1 3604844 159968 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2780 0.1 0.1 3651120 160128 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2825 0.1 0.1 3604872 159976 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2871 0.1 0.1 3604832 160176 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2914 0.1 0.1 3604840 159736 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 2959 0.1 0.1 3604848 159996 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3006 0.1 0.1 3604920 159820 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3012 0.1 0.1 3604852 159868 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3059 0.1 0.1 3604852 160052 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3104 0.1 0.1 3604940 160652 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3186 0.1 0.1 3604836 160056 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3284 0.1 0.1 3604824 160316 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3472 0.1 0.1 3604876 160136 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3494 0.1 0.1 3604852 159680 ? Sl 01:32 0:05 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3623 0.1 0.0 3382212 122060 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 7939 0.0 0.0 3382232 75236 ? Sl 01:32 0:00 | | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 7962 0.0 0.0 3388608 76856 ? S 01:32 0:00 | | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 3890 0.1 0.1 3604884 160320 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 4064 0.1 0.1 3604936 160388 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 4162 0.1 0.1 3604884 160260 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 4253 0.1 0.1 3604840 160472 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 4485 0.1 0.1 3604840 159932 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 4879 0.1 0.1 3604868 160344 ? Sl 01:32 0:06 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 4914 0.1 0.1 3604836 160148 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 5151 0.1 0.1 3604836 160108 ? Sl 01:32 0:09 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 5505 0.1 0.1 3604928 160212 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 5728 0.1 0.1 3604924 160408 ? Sl 01:32 0:08 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 6014 0.1 0.1 3604964 160328 ? Sl 01:32 0:05 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 6205 0.1 0.1 3604836 160404 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 6330 0.1 0.1 3604928 160224 ? Sl 01:32 0:07 | \_ /h2oai_env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 6497 0.1 0.1 3604848 159980 ? Sl 01:32 0:09 | \_ /env/bin/python3.6 -u -c import sys;exec(eval(sys.stdin.readline()))
We see a similar issue here:
py2-e2e runtests: commands[0] | py.test -vv -n 8 tests/e2e
============================= test session starts ==============================
platform linux2 -- Python 2.7.12, pytest-3.7.0, py-1.5.4, pluggy-0.7.1 -- /usr/share/mlt/.venv/bin/python2
cachedir: .pytest_cache
rootdir: /usr/share/mlt, inifile: tox.ini
plugins: xdist-1.22.5, forked-0.2, cov-2.5.1
[gw0] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw1] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw2] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw3] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw4] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw5] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw6] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw7] linux2 Python 2.7.12 cwd: /usr/share/mlt
[gw0] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw2] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw1] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw4] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw3] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw7] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw5] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
[gw6] Python 2.7.12 (default, Dec 4 2017, 14:50:18) -- [GCC 5.4.0 20160609]
gw0 [20] / gw1 [20] / gw2 [20] / gw3 [20] / gw4 [20] / gw5 [20] / gw6 [20] / gw7 [20]
scheduling tests via LoadScheduling
tests/e2e/test_config_updates.py::TestConfig::test_add_remove_config
tests/e2e/test_config_updates.py::TestConfig::test_update_config
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[hello-world]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[experiments]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch-distributed]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-distributed]
tests/e2e/test_config_updates.py::TestConfig::test_config_list
[gw6] [ 5%] PASSED tests/e2e/test_config_updates.py::TestConfig::test_config_list
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-dist-mnist]
[gw4] [ 10%] PASSED tests/e2e/test_config_updates.py::TestConfig::test_add_remove_config
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[horovod]
[gw3] [ 15%] PASSED tests/e2e/test_config_updates.py::TestConfig::test_update_config
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tensorboard]
[gw1] [ 20%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[hello-world]
[gw5] [ 25%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-distributed]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_enable_sync
[gw3] [ 30%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tensorboard]
[gw5] [ 35%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_enable_sync
[gw7] [ 40%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch]
[gw2] [ 45%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[experiments]
[gw0] [ 50%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[pytorch-distributed]
[gw4] [ 55%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[horovod]
[gw6] [ 60%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploying_templates[tf-dist-mnist]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_debug_wrapper
tests/e2e/test_templates.py::TestTemplates::test_templates
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[hello-world]
tests/e2e/test_templates.py::TestTemplates::test_local_templates
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[tf-distributed]
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_check_logs
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_no_push_deploy
tests/e2e/test_deploy_flow.py::TestDeployFlow::test_watch_build_and_deploy_no_push
[gw3] [ 65%] PASSED tests/e2e/test_templates.py::TestTemplates::test_templates
[gw0] [ 70%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[tf-distributed]
[gw7] [ 75%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_deploy_check_logs
[gw5] [ 80%] PASSED tests/e2e/test_templates.py::TestTemplates::test_local_templates
[gw2] [ 85%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_interactive_deploy[hello-world]
[gw6] [ 90%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_watch_build_and_deploy_no_push
[gw1] [ 95%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_no_push_deploy
[gw4] [100%] PASSED tests/e2e/test_deploy_flow.py::TestDeployFlow::test_debug_wrapper Makefile:177: recipe for target 'test-e2e-all-circleci' failed
make: *** [test-e2e-all-circleci] Terminated
Too long with no output (exceeded 20m0s)
Where tests seem to finish but then still hang. In this specific case, python2
tests seemed to finish but hung and then python3
tests never started (we use tox
to trigger tests).
Yeah, suffered from this issue sometimes. --fulltrace provide lock here
platform linux -- Python 3.6.1, pytest-5.0.1, py-1.5.3, pluggy-0.12.0 -- /usr/bin/python3.6 cachedir: .pytest_cache rootdir: /home/BlackHobbiT/path/to/test inifile: pytest.ini plugins: forked-1.0.2, xdist-1.29.0, allure-pytest-2.7.0 [gw0] linux Python 3.6.1 cwd: /home/BlackHobbiT/path/to/test [gw0] Python 3.6.1 (default, Sep 7 2017, 16:36:03) -- [GCC 6.3.0 20170406]
config = <_pytest.config.Config object at 0x7f1cac7e4da0>, doit = <function _main at 0x7f1caccb0f28>
def wrap_session(config, doit):
"""Skeleton command line program"""
session = Session(config)
session.exitstatus = ExitCode.OK
initstate = 0
try:
try:
config._do_configure()
initstate = 1
config.hook.pytest_sessionstart(session=session)
initstate = 2
> session.exitstatus = doit(config, session) or 0
/usr/local/lib/python3.6/dist-packages/_pytest/main.py:213:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
config = <_pytest.config.Config object at 0x7f1cac7e4da0>, session = <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>
def _main(config, session):
""" default command line protocol for initialization, session,
running tests and reporting. """
config.hook.pytest_collection(session=session)
> config.hook.pytest_runtestloop(session=session)
/usr/local/lib/python3.6/dist-packages/_pytest/main.py:257:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_HookCaller 'pytest_runtestloop'>, args = (), kwargs = {'session': <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>}
notincall = set()
def __call__(self, *args, **kwargs):
if args:
raise TypeError("hook calling supports only keyword arguments")
assert not self.is_historic()
if self.spec and self.spec.argnames:
notincall = (
set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
)
if notincall:
warnings.warn(
"Argument(s) {} which are declared in the hookspec "
"can not be found in this hook call".format(tuple(notincall)),
stacklevel=2,
)
> return self._hookexec(self, self.get_hookimpls(), kwargs)
/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py:289:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_pytest.config.PytestPluginManager object at 0x7f1caee2ab00>, hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/usr/local/lib/python3.6/dist-packages/_pytest/main...1cab5f7f98>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f1cab5ce2b0>>]
kwargs = {'session': <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>}
def _hookexec(self, hook, methods, kwargs):
# called from all hookcaller instances.
# enable_tracing will set its own wrapping function at self._inner_hookexec
> return self._inner_hookexec(hook, methods, kwargs)
/usr/local/lib/python3.6/dist-packages/pluggy/manager.py:87:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/usr/local/lib/python3.6/dist-packages/_pytest/main...1cab5f7f98>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f1cab5ce2b0>>]
kwargs = {'session': <Session session exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=16>}
self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
methods,
kwargs,
> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
)
/usr/local/lib/python3.6/dist-packages/pluggy/manager.py:81:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <xdist.dsession.DSession object at 0x7f1cab5f7f98>
def pytest_runtestloop(self):
self.sched = self.config.hook.pytest_xdist_make_scheduler(
config=self.config, log=self.log
)
assert self.sched is not None
self.shouldstop = False
while not self.session_finished:
> self.loop_once()
/usr/local/lib/python3.6/dist-packages/xdist/dsession.py:115:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <xdist.dsession.DSession object at 0x7f1cab5f7f98>
def loop_once(self):
"""Process one callback from one of the workers."""
while 1:
if not self._active_nodes:
# If everything has died stop looping
self.triggershutdown()
raise RuntimeError("Unexpectedly no active workers available")
try:
> eventcall = self.queue.get(timeout=2.0)
/usr/local/lib/python3.6/dist-packages/xdist/dsession.py:129:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <queue.Queue object at 0x7f1cab5f76d8>, block = True, timeout = 2.0
def get(self, block=True, timeout=None):
'''Remove and return an item from the queue.
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until an item is available. If 'timeout' is
a non-negative number, it blocks at most 'timeout' seconds and raises
the Empty exception if no item was available within that time.
Otherwise ('block' is false), return an item if one is immediately
available, else raise the Empty exception ('timeout' is ignored
in that case).
'''
with self.not_empty:
if not block:
if not self._qsize():
raise Empty
elif timeout is None:
while not self._qsize():
self.not_empty.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = time() + timeout
while not self._qsize():
remaining = endtime - time()
if remaining <= 0.0:
raise Empty
> self.not_empty.wait(remaining)
/usr/lib/python3.6/queue.py:173:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Condition(<unlocked _thread.lock object at 0x7f1cab4f2120>, 0)>, timeout = 1.9999943980947137
def wait(self, timeout=None):
"""Wait until notified or until a timeout occurs.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notify_all() call for the same condition
variable in another thread, or until the optional timeout occurs. Once
awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
When the underlying lock is an RLock, it is not released using its
release() method, since this may not actually unlock the lock when it
was acquired multiple times recursively. Instead, an internal interface
of the RLock class is used, which really unlocks it even when it has
been recursively acquired several times. Another internal interface is
then used to restore the recursion level when the lock is reacquired.
"""
if not self._is_owned():
raise RuntimeError("cannot wait on un-acquired lock")
waiter = _allocate_lock()
waiter.acquire()
self._waiters.append(waiter)
saved_state = self._release_save()
gotit = False
try: # restore state no matter what (e.g., KeyboardInterrupt)
if timeout is None:
waiter.acquire()
gotit = True
else:
if timeout > 0:
> gotit = waiter.acquire(True, timeout)
E KeyboardInterrupt
/usr/lib/python3.6/threading.py:299: KeyboardInterrupt
Try to use pytest-timeout, pytest --timeout=
Hey @telles-simbiose and @BlackHobbiT,
Did you manage to make it work? We suffer from the same issue. The test run hangs on 93%. All the workers are busy, only killing the process in task manager solves it, meaning let the run continue. When that worker is crushed, the report is also lost from that specific test.
Thanks
I'm facing a very weird issue. When I run my test in parallel- Only 1 browser opens even when I execute pytest -n 2 After first test is successful, pytest just hangs and it does not even timeout Has anyone faced this issue?
py.test -vv -n 2
=================================================================================================== test session starts ===================================================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.10.0, pluggy-0.13.1 --
///*
cachedir: .pytest_cache
rootdir: //*, inifile: pytest.ini
plugins: xdist-1.31.0, allure-pytest-2.8.40, repeat-0.9.1, forked-1.1.3, timeout-1.4.2
[gw0] darwin Python 3.7.3 cwd:
/*/
[gw1] darwin Python 3.7.3 cwd:
//*
collected 8 items
[gw0] Python 3.7.3 (default, Apr 24 2020, 18:51:23) -- [Clang 11.0.3 (clang-1103.0.32.62)]
[gw1] Python 3.7.3 (default, Apr 24 2020, 18:51:23) -- [Clang 11.0.3 (clang-1103.0.32.62)]
gw0 [8] / gw1 [8]
scheduling tests via MarkBasedScheduler
test1 [gw1] [ 12%] PASSED test1
It will get stuck after this.
@tamaskakuszi as far I remember, wiping pycache dirs sometimes helps.
@RonnyPfannschmidt we're also seeing this intermittently. If you'd like access to one of our environments, I can make that happen. Thanks!
@JacobCallahan shot me more details at the work channel
Any updated about this topic @RonnyPfannschmidt @JacobCallahan?
We are facing this in the CI of the company I work. We use xdist to run the tests in parallel and it seems this happens from time to time when fail fast is enabled and the test session is aborted. We then have a zombie process that is stuck running this command https://github.com/pytest-dev/execnet/blame/d7ca9815734a4efb168c3ef997858e38c040fc70/execnet/gateway_io.py#L58 as far as I can tell. It would make sense as we are using xdist.
I don't really understand what this line is supposed to do but it looks like some old workaround possibly?
I could also create an issue in execnet as well if that is of any use.
this line bootstraps execnet, the rest is fed as commend in stdio
Okay, any idea why processes can be left hanging on that command? There is this command seen in pstree and then it's waiting to read a file descriptor, I guess stdin, but nothing is being written there by any process.
That seems like the control process died and the worker is waiting for the shutdown command, fetching a stack trace with gdb is only partially helpful as the io is being handled multithreaded, and the state of the worker is unclear
Thanks for your time Ronny, I tried digging around with gdb but I only basically found a reference back to the code I mentioned. The reset of the trace was in C so it went a bit over my head. Is the interrupt from pytest then possibly not handled correctly. We only use the built in fail fast in pytest and don't kill anything extra.
@Bruniz thats unclear, its entirely possible our suite is hanging somewhere in c and the shut-down isn't reaching it with the information provided i'm unable to make a educated guess
Thanks for your time Ronny, I tried digging around with gdb but I only basically found a reference back to the code I mentioned. The reset of the trace was in C so it went a bit over my head. Is the interrupt from pytest then possibly not handled correctly. We only use the built in fail fast in pytest and don't kill anything extra.
@Bruniz by fail fast, do you mean -x
?
having a paste of command + command output would be a big help
I'm seeing this problem in our CI now, too.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
jenkins 1 0.0 0.0 1000 4 ? Ss 02:32 0:00 /run/podman-init -- ./build.sh
jenkins 7 0.0 0.0 4072 3272 ? S 02:32 0:00 /bin/bash -e ./build.sh
jenkins 227 1.6 0.0 850404 36284 ? Sl 02:34 0:31 /usr/bin/python3 -P /usr/bin/tox
jenkins 387 0.3 0.0 86652 79016 ? S 02:35 0:05 /var/tmp/jenkins/tox/.pkg/bin/python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
jenkins 450 0.0 0.0 4204 3320 ? S 02:35 0:00 /bin/bash /home/jenkins/pbench/exec-tests /var/tmp/jenkins/tox/py39
jenkins 560 0.0 0.0 2332 916 ? S 02:35 0:00 /usr/bin/time --format=\n\nCommand: '%C'\nExit status: %x\nTimings: user %Us, system %Ss, elapsed %es (%E, %P)\nMemory: max RSS %Mk, minor pf: %R, major pf: %F, swaps %W\nContext switches: inv %c, vol %w, signals %k\nI/O: fs in %I, fs out %O, socket in %r, socket out %s\n pytest --tb=native -n auto --basetemp=/var/tmp/jenkins/tox/py39/tmp --cov=/var/tmp/jenkins/tox/py39/lib/python3.9/site-packages/pbench --cov-report xml:cov/report.xml -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
jenkins 561 2.2 0.0 2587628 241432 ? Sl 02:35 0:41 /var/tmp/jenkins/tox/py39/bin/python /var/tmp/jenkins/tox/py39/bin/pytest --tb=native -n auto --basetemp=/var/tmp/jenkins/tox/py39/tmp --cov=/var/tmp/jenkins/tox/py39/lib/python3.9/site-packages/pbench --cov-report xml:cov/report.xml -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
jenkins 567 0.5 0.0 307108 149232 ? Sl 02:35 0:09 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 570 0.4 0.0 299348 140964 ? Sl 02:35 0:07 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 573 0.5 0.0 308988 150952 ? Sl 02:35 0:10 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 576 0.4 0.0 302416 144392 ? Sl 02:35 0:07 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 579 0.5 0.0 311952 153348 ? Sl 02:35 0:10 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 582 0.5 0.0 305180 149068 ? Sl 02:35 0:09 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
jenkins 585 0.6 0.0 313972 155868 ? Sl 02:35 0:11 /var/tmp/jenkins/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
[and 25 more worker processes in similar states]
What other information can I provide?
$ /var/tmp/jenkins/tox/py39/bin/python --version
Python 3.9.16
$ /var/tmp/jenkins/tox/py39/bin/pytest --version
pytest 7.4.0
Here's the log leading up to the hang:
py39: install_deps> pip install --cache-dir=/var/tmp/jenkins/tox/cache --progress-bar off --prefix=/var/tmp/jenkins/tox/py39 -r /home/jenkins/pbench/agent/requirements.txt -r /home/jenkins/pbench/agent/test-requirements.txt -r /home/jenkins/pbench/client/requirements.txt -r /home/jenkins/pbench/server/requirements.txt -r /home/jenkins/pbench/server/test-requirements.txt
.pkg: install_requires> python -I -m pip install 'setuptools>=46.1.3' 'wheel>=0.34.2'
.pkg: _optional_hooks> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: get_requires_for_build_sdist> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: install_requires_for_build_sdist> python -I -m pip install pbr
.pkg: prepare_metadata_for_build_wheel> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: build_sdist> python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
py39: install_package> pip install --cache-dir=/var/tmp/jenkins/tox/cache --progress-bar off --prefix=/var/tmp/jenkins/tox/py39 --force-reinstall --no-deps /var/tmp/jenkins/tox/.tmp/package/1/pbench-0.0.1.dev2674.tar.gz
py39: commands[0]> bash -c '/home/jenkins/pbench/exec-tests /var/tmp/jenkins/tox/py39 '
Running agent,client,server python3-based unit tests via pytest
Package Version
------------------------ -------------
alembic 1.11.1
aniso8601 9.0.1
appier 1.21.7
bcrypt 4.0.1
Bcrypt-Flask 1.0.2
beautifulsoup4 4.9.3
blinker 1.6.2
boto3 1.17.97
botocore 1.20.97
bottle 0.12.25
bs4 0.0.1
cachetools 4.2.2
certifi 2021.5.30
cffi 1.15.1
chardet 4.0.0
click 8.1.6
coverage 7.2.7
cryptography 41.0.2
docutils 0.20.1
elasticsearch 7.13.4
exceptiongroup 1.1.2
execnet 2.0.2
filelock 3.12.2
Flask 2.3.2
Flask-Cors 4.0.0
Flask-HTTPAuth 4.8.0
Flask-JWT-Extended 4.5.2
Flask-Migrate 4.0.4
Flask-RESTful 0.3.10
Flask-SQLAlchemy 3.0.5
freezegun 1.2.2
gitdb 4.0.10
GitPython 3.1.32
google-api 0.1.12
google-api-core 1.30.0
google-api-python-client 2.9.0
google-auth 1.31.0
google-auth-httplib2 0.1.0
google-auth-oauthlib 0.4.4
googleapis-common-protos 1.53.0
greenlet 2.0.2
gunicorn 21.2.0
hiredis 2.2.3
httplib2 0.19.1
humanize 4.7.0
idna 2.10
ifaddr 0.2.0
importlib-metadata 6.8.0
iniconfig 2.0.0
itsdangerous 2.1.2
Jinja2 3.1.2
jmespath 0.10.0
lockfile 0.12.2
Mako 1.2.4
MarkupSafe 2.1.3
mock 5.1.0
oauthlib 3.1.1
packaging 20.9
pbench 0.0.1.dev2674
pip 22.3.1
pluggy 1.2.0
pquisby 0.0.12
protobuf 3.17.3
psutil 5.9.5
psycopg2 2.9.6
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
pyesbulk 2.1.1
PyJWT 2.8.0
pyparsing 2.4.7
pytest 7.4.0
pytest-cov 4.1.0
pytest-dependency 0.5.1
pytest-freezegun 0.4.2
pytest-helpers-namespace 2019.1.8
pytest-mock 3.11.1
pytest-xdist 3.3.1
python-daemon 3.0.1
python-dateutil 2.8.1
python-pidfile 3.1.1
pytz 2019.1
PyYAML 6.0.1
redis 3.5.3
requests 2.25.1
requests-mock 1.11.0
requests-oauthlib 1.3.0
responses 0.23.1
rsa 4.7.2
s3transfer 0.4.2
sdnotify 0.3.2
setuptools 65.5.1
sh 2.0.4
six 1.16.0
smmap 5.0.0
soupsieve 2.2.1
SQLAlchemy 2.0.19
SQLAlchemy-Utils 0.41.1
state-signals 1.0.1
tomli 2.0.1
types-PyYAML 6.0.12.11
typing_extensions 4.7.1
uritemplate 3.0.1
urllib3 1.26.5
Werkzeug 2.3.6
wheel 0.38.4
zipp 3.16.2
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.4.0, pluggy-1.2.0
cachedir: /var/tmp/jenkins/tox/py39/.pytest_cache
rootdir: /home/jenkins/pbench
configfile: pytest.ini
plugins: cov-4.1.0, freezegun-0.4.2, helpers-namespace-2019.1.8, requests-mock-1.11.0, mock-3.11.1, xdist-3.3.1, dependency-0.5.1
created: 32/32 workers
32 workers [1521 items]
[and nothing more...]
Trying the tests in my development environment, they hang too. Since I'm running interactively, there, I get slightly more output from Pytest:
=============== test session starts ================
platform linux -- Python 3.9.16, pytest-7.4.0, pluggy-1.2.0
cachedir: /var/tmp/wscales/tox/py39/.pytest_cache
rootdir: /home/wscales/pbench
configfile: pytest.ini
plugins: requests-mock-1.11.0, xdist-3.3.1, mock-3.11.1, helpers-namespace-2019.1.8, freezegun-0.4.2, dependency-0.5.1, cov-4.1.0
8 workers [1521 items]
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ss....s.s.....s.........s..........................s..s...s.....s...s.s............................................................................................................................s..s..s...s......ss..s..s..s......s.s......ss.sss....s......s.....................................................
The ps
output looks pretty much the same (only there are 8 workers instead of 32):
$ ps -auxww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
wscales 1 0.0 0.0 1084 4 pts/0 Ss 04:32 0:00 /run/podman-init -- tox
wscales 2 2.4 0.0 776700 38664 pts/0 Sl+ 04:32 0:04 /usr/bin/python3 -P /usr/bin/tox
wscales 162 4.4 0.1 88460 81804 pts/0 S+ 04:32 0:06 /var/tmp/wscales/tox/.pkg/bin/python /usr/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta
wscales 226 0.0 0.0 4204 3244 pts/0 S+ 04:32 0:00 /bin/bash /home/wscales/pbench/exec-tests /var/tmp/wscales/tox/py39
wscales 270 0.0 0.0 2332 912 pts/0 S+ 04:32 0:00 /usr/bin/time --format=\n\nCommand: '%C'\nExit status: %x\nTimings: user %Us, system %Ss, elapsed %es (%E, %P)\nMemory: max RSS %Mk, minor pf: %R, major pf: %F, swaps %W\nContext switches: inv %c, vol %w, signals %k\nI/O: fs in %I, fs out %O, socket in %r, socket out %s\n pytest --tb=native -n auto --basetemp=/var/tmp/wscales/tox/py39/tmp --cov=/var/tmp/wscales/tox/py39/lib/python3.9/site-packages/pbench --cov-report html:/var/tmp/wscales/tox/py39/cov/html -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
wscales 271 7.8 0.1 698088 107188 pts/0 Sl+ 04:32 0:09 /var/tmp/wscales/tox/py39/bin/python /var/tmp/wscales/tox/py39/bin/pytest --tb=native -n auto --basetemp=/var/tmp/wscales/tox/py39/tmp --cov=/var/tmp/wscales/tox/py39/lib/python3.9/site-packages/pbench --cov-report html:/var/tmp/wscales/tox/py39/cov/html -rs --pyargs pbench.test.unit.common pbench.test.unit.agent pbench.test.functional.agent pbench.test.unit.client pbench.test.unit.server
wscales 276 15.3 0.2 322884 164796 pts/0 Sl+ 04:32 0:18 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 280 5.6 0.1 215148 126652 pts/0 Sl+ 04:32 0:06 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 284 15.6 0.3 355472 196948 pts/0 Sl+ 04:32 0:18 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 288 14.8 0.3 359424 197008 pts/0 Sl+ 04:32 0:17 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 292 13.5 0.2 324888 166400 pts/0 Sl+ 04:32 0:16 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 296 13.3 0.2 326292 167844 pts/0 Sl+ 04:32 0:16 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 300 13.5 0.2 319792 161728 pts/0 Sl+ 04:32 0:16 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
wscales 304 11.6 0.2 341464 180100 pts/0 Sl+ 04:32 0:14 /var/tmp/wscales/tox/py39/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
Nevermind!...
It turned out that one parametrize'd scenario of one test was sitting in an infinite loop, and that caused the test run to hang a few tests short of the end. Once I found and addressed the problem, the tests no longer hang.
That's great, but was there some way that I should have been able to find this more easily?
The timeout Plugin tends to be a great Help for hangups
Anybody wants to contribute a change to the docs mentioning pytest-timeout
? That would be excellent.
Anybody wants to contribute a change to the docs mentioning
pytest-timeout
? That would be excellent.
😁 Indeed, especially if it points out the fact that it is hard to figure out which test is hanging without it!
@nicoddemus I'm wondering if xdist should ensure to identify all currently running tests and their phases whenever a node exceeds a predetermined timeframe
An even further expansion of this might be printing stacktraces
Sounds good @RonnyPfannschmidt, indeed it makes sense for a new option to at least warn the user if a test has been running for X seconds (configurable, perhaps with a reasonable default of say 120s). However I would leave the job of cancelling long running test to pytest-timeout
.
Indeed, debugging Print is fine but the terminate gun ought to be opt in
Hello everyone, Last night I let my test suite running until this morning, but I noticed that it hasn't finished running all tests, looking at
htop
, I noticed some strange processes running for a really long time, as shown in this screenshot:Looking at the tests output, I saw that the last ran tests were all ran by the same worker
gw2
(there were 4 workers running), as there were 3 processesimport sys;exec(eval(sys.stdin.readline()))
running for 13+ hours, I think that those 3 workers were just stuck somehow.