Closed ogrisel closed 1 year ago
Actually I spoke too quickly, I can reproduce with:
pip install pytest-repeat
pytest --count=10000 -vlxk test_error_in_nested_call_keeps_resource_tracker_silent
[...]
_______________________________________________ ERROR at teardown of TestExecutorInitializer.test_error_nested_call_keeps_resource_tracker_silent[1981-10000] ________________________________________________
self = <tests.test_reusable_executor.TestExecutorInitializer object at 0x49d56afb0>
method = <bound method TestExecutorInitializer.test_error_nested_call_keeps_resource_tracker_silent of <tests.test_reusable_executor.TestExecutorInitializer object at 0x49d56afb0>>
def teardown_method(self, method):
"""Make sure the executor can be recovered after the tests"""
executor = get_reusable_executor(max_workers=2)
assert executor.submit(math.sqrt, 1).result() == 1
# There can be less than 2 workers because of the worker timeout
> _check_subprocesses_number(executor, expected_max_process_number=2)
executor = <loky.reusable_executor._ReusablePoolExecutor object at 0x4bca0d1b0>
method = <bound method TestExecutorInitializer.test_error_nested_call_keeps_resource_tracker_silent of <tests.test_reusable_executor.TestExecutorInitializer object at 0x49d56afb0>>
self = <tests.test_reusable_executor.TestExecutorInitializer object at 0x49d56afb0>
tests/_executor_mixin.py:250:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
executor = <loky.reusable_executor._ReusablePoolExecutor object at 0x4bca0d1b0>, expected_process_number = None, expected_max_process_number = 2, patience = 100
def _check_subprocesses_number(executor=None, expected_process_number=None,
expected_max_process_number=None, patience=100):
if not psutil:
# psutil is not installed, we cannot check the number of subprocesses
return
# Wait for terminating processes to disappear
children_cmdlines = _running_children_with_cmdline(psutil.Process())
pids_cmdlines = [(c.pid, cmdline) for c, cmdline in children_cmdlines]
children_pids = {pid for pid, _ in pids_cmdlines}
if executor is not None:
worker_pids = set(executor._processes.keys())
else:
# Bypass pids checks when executor has been garbage
# collected
worker_pids = children_pids
if expected_process_number is not None:
try:
assert len(children_pids) == expected_process_number, pids_cmdlines
assert len(worker_pids) == expected_process_number, pids_cmdlines
assert worker_pids == children_pids, pids_cmdlines
except AssertionError:
if expected_process_number != 0:
raise
# there is a race condition with the /proc/<pid>/ system clean up
# and our utilization of psutil. The Process is considered alive by
# psutil even though it have been terminated. Wait for the system
# clean up in this case.
for _ in range(patience):
if not _running_children_with_cmdline(psutil.Process()):
break
time.sleep(.1)
else:
raise
if expected_max_process_number is not None:
assert len(children_pids) <= expected_max_process_number, pids_cmdlines
> assert len(worker_pids) <= expected_max_process_number, pids_cmdlines
E AssertionError: [(78570, '/Users/ogrisel/mambaforge/envs/dev/bin/python3.10 -m loky.backend.popen_loky_posix --process-name LokyProcess-1 --pipe 28')]
children_cmdlines = [(psutil.Process(pid=78570, name='python3.10', status='running', started='10:23:11'), '/Users/ogrisel/mambaforge/envs/dev/bin/python3.10 -m loky.backend.popen_loky_posix --process-name LokyProcess-1 --pipe 28')]
children_pids = {78570}
executor = <loky.reusable_executor._ReusablePoolExecutor object at 0x4bca0d1b0>
expected_max_process_number = 2
expected_process_number = None
patience = 100
pids_cmdlines = [(78570, '/Users/ogrisel/mambaforge/envs/dev/bin/python3.10 -m loky.backend.popen_loky_posix --process-name LokyProcess-1 --pipe 28')]
worker_pids = {78570, 80561, 80562}
tests/_executor_mixin.py:106: AssertionError
tests/_executor_mixin.py:106: AssertionError
------------------------------------------------------------------------------------------- Captured stderr setup --------------------------------------------------------------------------------------------
[DEBUG:MainProcess:MainThread] Reusing existing executor with max_workers=2.
-------------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------------
[INFO:LokyProcess-4:MainThread] Shutting down worker after timeout 10.000s
[DEBUG:LokyProcess-4:MainThread] Exited cleanly
[INFO:LokyProcess-4:MainThread] process shutting down
[DEBUG:LokyProcess-4:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:LokyProcess-4:MainThread] running the remaining "atexit" finalizers
[INFO:LokyProcess-4:MainThread] process exiting with exitcode 0
------------------------------------------------------------------------------------------ Captured stderr teardown ------------------------------------------------------------------------------------------
[DEBUG:MainProcess:MainThread] Reusing existing executor with max_workers=2.
[DEBUG:MainProcess:MainThread] created semlock with handle 29 and name "None"
[DEBUG:MainProcess:ExecutorManagerThread] created semlock with handle 27 and name "None"
[DEBUG:MainProcess:ExecutorManagerThread] launched python with pid 80561 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python3.10', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-5', '--pipe', '31']
[DEBUG:MainProcess:ExecutorManagerThread] Adjust process count : {78570: <LokyProcess name='LokyProcess-1' pid=78570 parent=74703 started>, 80561: <LokyProcess name='LokyProcess-5' pid=80561 parent=74703 started>}
[DEBUG:MainProcess:MainThread] launched python with pid 80562 and cmd:
['/Users/ogrisel/mambaforge/envs/dev/bin/python3.10', '-m', 'loky.backend.popen_loky_posix', '--process-name', 'LokyProcess-6', '--pipe', '32']
[DEBUG:MainProcess:MainThread] Adjust process count : {78570: <LokyProcess name='LokyProcess-1' pid=78570 parent=74703 started>, 80561: <LokyProcess name='LokyProcess-5' pid=80561 parent=74703 started>, 80562: <LokyProcess name='LokyProcess-6' pid=80562 parent=74703 started>}
============================================================================================== warnings summary ==============================================================================================
tests/test_reusable_executor.py::TestExecutorInitializer::test_error_nested_call_keeps_resource_tracker_silent[1981-10000]
/Users/ogrisel/code/loky/loky/process_executor.py:697: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================== short test summary info ===========================================================================================
ERROR tests/test_reusable_executor.py::TestExecutorInitializer::test_error_nested_call_keeps_resource_tracker_silent[1981-10000] - AssertionError: [(78570, '/Users/ogrisel/mambaforge/envs/dev/bin/python3...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================== 1981 passed, 3070000 deselected, 1 warning, 1 error in 208.96s (0:03:28) ==================================================================
[INFO:LokyProcess-1:MainThread] Shutting down worker on sentinel
[DEBUG:LokyProcess-1:MainThread] Exited cleanly
[INFO:LokyProcess-1:MainThread] process shutting down
[DEBUG:LokyProcess-1:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:LokyProcess-1:MainThread] running the remaining "atexit" finalizers
[INFO:LokyProcess-1:MainThread] process exiting with exitcode 0
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 17 and name "/loky-74703-amd4qe41"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 17 and name "/loky-74703-amd4qe41"
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 18 and name "/loky-74703-0v_leqrn"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 18 and name "/loky-74703-0v_leqrn"
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 19 and name "/loky-74703-ppxc0x1k"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 19 and name "/loky-74703-ppxc0x1k"
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 22 and name "/loky-74703-o75tgnqy"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 22 and name "/loky-74703-o75tgnqy"
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 23 and name "/loky-74703-hqg2l_0m"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 23 and name "/loky-74703-hqg2l_0m"
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 11 and name "/loky-74703-wwh3lnfy"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 11 and name "/loky-74703-wwh3lnfy"
[DEBUG:LokyProcess-6:MainThread] recreated blocker with handle 29 and name "/loky-74703-r4gy5e2_"
[DEBUG:LokyProcess-5:MainThread] recreated blocker with handle 27 and name "/loky-74703-rax6nky6"
[INFO:LokyProcess-6:MainThread] child process calling self.run()
[INFO:LokyProcess-5:MainThread] child process calling self.run()
[DEBUG:LokyProcess-6:MainThread] Worker started with timeout=10
[DEBUG:LokyProcess-5:MainThread] Worker started with timeout=10
[INFO:LokyProcess-6:MainThread] Shutting down worker on sentinel
[DEBUG:LokyProcess-6:MainThread] Exited cleanly
[INFO:LokyProcess-6:MainThread] process shutting down
[DEBUG:LokyProcess-6:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:LokyProcess-6:MainThread] running the remaining "atexit" finalizers
[INFO:LokyProcess-6:MainThread] process exiting with exitcode 0
[INFO:LokyProcess-5:MainThread] Shutting down worker on sentinel
[DEBUG:LokyProcess-5:MainThread] Exited cleanly
[INFO:LokyProcess-5:MainThread] process shutting down
[DEBUG:LokyProcess-5:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:LokyProcess-5:MainThread] running the remaining "atexit" finalizers
[INFO:LokyProcess-5:MainThread] process exiting with exitcode 0
[INFO:MainProcess:MainThread] process shutting down
will mark the test as XFAIL for now.
Actually this might not be the same problem...
Adapted reproducer for #369 to be part of the loky test suite.
Unfortunately I could never trigger the race condition locally so far.