python / cpython

The Python programming language
https://www.python.org/
Other
60.06k stars 29.08k forks source link

`WithThreadsTestPool.test_release_task_refs`: flaky test or possible free-threading bug #118413

Closed colesbury closed 2 weeks ago

colesbury commented 3 weeks ago

This might be a free-threading related bug or possibly a flaky test. We should investigate and try to figure out the root cause.

======================================================================
FAIL: test_release_task_refs (test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\_test_multiprocessing.py", line 2817, in test_release_task_refs
    self.assertEqual(set(wr() for wr in refs), {None})
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Items in the first set but not the second:
<test._test_multiprocessing.CountedObject object at 0x03E7CA50>

Originally reported by hugovk in https://github.com/python/cpython/issues/118331#issuecomment-208372155 (https://github.com/python/cpython/actions/runs/8884999534/attempts/1).

Linked PRs

colesbury commented 2 weeks ago

I'm able to reproduce this locally on Linux with a few modifications:

First, apply https://gist.github.com/colesbury/f8ba31699eb6fcc79a29a2dd4e41e817 to enable the test and run it in a loop.

Then run multiple test instances in parallel on a single CPU:

taskset -c 0 ./python -m test -j 8 test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs test_multiprocessing_spawn.test_threads -m test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_release_task_refs
colesbury commented 2 weeks ago

The fix is to move the time.sleep(DELTA) before the gc.collect():

https://github.com/python/cpython/blob/c408c36e9b346f9f15a34e98a5596f311df65efa/Lib/test/_test_multiprocessing.py#L2816-L2817

The time.sleep() is because the worker processes hold onto the result briefly after it's returned to the main thread. In the free-threaded build, we want the gc.collect() to happen after the workers no longer reference the the CountedObject() values because of the some objects passed across threads have their destructors delayed.

colesbury commented 2 weeks ago

The test is fixed now and re-enabled in the free-threaded build.