python / cpython

The Python programming language
https://www.python.org/
Other
60.06k stars 29.09k forks source link

``test_free_threading`` deadlocks #118444

Open Eclips4 opened 3 weeks ago

Eclips4 commented 3 weeks ago

Bug report

Bug description:

Windows default (non free-threading) debug build, the current main branch, happens during running the regrtest:

...a lot of the same lines
0:40:12 load avg: 0.00 running (1): test_free_threading (29 min 58 sec)
0:40:42 load avg: 0.00 running (1): test_free_threading (30 min 28 sec)
0:41:12 load avg: 0.00 running (1): test_free_threading (30 min 58 sec)
0:41:42 load avg: 0.00 running (1): test_free_threading (31 min 28 sec)
0:42:12 load avg: 0.00 running (1): test_free_threading (31 min 58 sec)
0:42:42 load avg: 0.00 running (1): test_free_threading (32 min 28 sec)
0:43:12 load avg: 0.00 running (1): test_free_threading (32 min 58 sec)
0:43:42 load avg: 0.00 running (1): test_free_threading (33 min 28 sec)
0:44:12 load avg: 0.00 running (1): test_free_threading (33 min 58 sec)
0:44:42 load avg: 0.00 running (1): test_free_threading (34 min 28 sec)
0:45:12 load avg: 0.00 running (1): test_free_threading (34 min 58 sec)
0:45:42 load avg: 0.00 running (1): test_free_threading (35 min 28 sec)
0:46:12 load avg: 0.00 running (1): test_free_threading (35 min 58 sec)
0:46:42 load avg: 0.00 running (1): test_free_threading (36 min 28 sec)
0:47:12 load avg: 0.00 running (1): test_free_threading (36 min 58 sec)
0:47:42 load avg: 0.00 running (1): test_free_threading (37 min 28 sec)
0:48:12 load avg: 0.00 running (1): test_free_threading (37 min 58 sec)
0:48:42 load avg: 0.00 running (1): test_free_threading (38 min 28 sec)
0:49:12 load avg: 0.00 running (1): test_free_threading (38 min 58 sec)
0:49:42 load avg: 0.00 running (1): test_free_threading (39 min 28 sec)
0:50:12 load avg: 0.00 running (1): test_free_threading (39 min 58 sec)
0:50:42 load avg: 0.00 running (1): test_free_threading (40 min 28 sec)
0:51:12 load avg: 0.00 running (1): test_free_threading (40 min 58 sec)
0:51:42 load avg: 0.00 running (1): test_free_threading (41 min 28 sec)
0:52:12 load avg: 0.00 running (1): test_free_threading (41 min 58 sec)
0:52:42 load avg: 0.00 running (1): test_free_threading (42 min 28 sec)
0:53:12 load avg: 0.00 running (1): test_free_threading (42 min 58 sec)
0:53:42 load avg: 0.00 running (1): test_free_threading (43 min 28 sec)
0:54:12 load avg: 0.00 running (1): test_free_threading (43 min 58 sec)
0:54:42 load avg: 0.00 running (1): test_free_threading (44 min 28 sec)
0:55:12 load avg: 0.00 running (1): test_free_threading (44 min 58 sec)
0:55:42 load avg: 0.00 running (1): test_free_threading (45 min 28 sec)
0:56:12 load avg: 0.00 running (1): test_free_threading (45 min 58 sec)

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

colesbury commented 3 weeks ago

Thanks @Eclips4. Was this in GitHub actions or one of the buildbots?

It might be fixed by https://github.com/python/cpython/pull/118412, or it might be a different bug.

Eclips4 commented 3 weeks ago

Thanks @Eclips4. Was this in GitHub actions or one of the buildbots?

It might be fixed by #118412, or it might be a different bug.

It's happened in my local setup. Now I'm seeing this:

./python -m test -R 3:3 test_free_threading
Running Debug|x64 interpreter...
Using random seed: 2036708708
0:00:00 Run 1 test sequentially
0:00:00 [1/1] test_free_threading
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
Assertion failed: _PyOpcode_Deopt[frame->instr_ptr->op.code] == SEND || _PyOpcode_Deopt[frame->instr_ptr->op.code] == FOR_ITER || _PyOpcode_Deopt[frame->instr_ptr->op.code] == INTERPRETER_EXIT || _PyOpcode_Deopt[frame->instr_ptr->op.code] == ENTER_EXECUTOR, file C:\Users\KIRILL-1\CLionProjects\cpython\Python\generated_cases.c.h, line 6042
Fatal Python error: Aborted
Eclips4 commented 3 weeks ago

It might be fixed by #118412, or it might be a different bug.

Unfortunately, seems it doesn't:

git branch
  3.12
* gh-118332-stop-the-world-deadlock
  main
PS C:\Users\KIRILL-1\CLionProjects\cpython> ./python -m test -R 3:3 test_free_threading
Running Debug|x64 interpreter...
Using random seed: 109577804
0:00:00 Run 1 test sequentially
0:00:00 [1/1] test_free_threading
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
// Ten minutes of waiting and no single run has passed
Eclips4 commented 2 weeks ago

@colesbury Seems there's no deadlock, this test just takes a lot of time:

./python -m test -R 3:3 test_free_threading
Running Debug|x64 interpreter...
Using random seed: 2042239677
0:00:00 Run 1 test sequentially
0:00:00 [1/1] test_free_threading
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. X..
test_free_threading leaked [20, 0, 0] memory blocks, sum=20 (this is fine)
test_free_threading passed in 45 min

== Tests result: SUCCESS ==

1 test OK.

Total duration: 45 min
Total tests: run=5
Total test files: run=1/1
Result: SUCCESS

So, sorry for the false report. I'm going to close this.

Eclips4 commented 2 weeks ago

One thing that I would like to mention is that this test takes a long time to complete on Windows, compared to Linux (Ubuntu 20.04 on WSL).

Windows:

 ./python -m test -v test_free_threading
Running Debug|x64 interpreter...
== CPython 3.13.0a6+ (heads/main:4a5ad8469a, May 2 2024, 11:40:34) [MSC v.1933 64 bit (AMD64)]
== Windows-10-10.0.19043-SP0 little-endian
== Python build: debug
== cwd: C:\Users\KIRILL-1\CLionProjects\cpython\build\test_python_worker_22240æ
== CPU count: 16
== encodings: locale=cp1251 FS=utf-8
== resources: all test resources are disabled, use -u option to unskip tests

Using random seed: 932841113
0:00:00 Run 1 test sequentially
0:00:00 [1/1] test_free_threading
test_register_callback (test.test_free_threading.test_monitoring.MonitoringMisc.test_register_callback) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.MonitoringMultiThreaded.test_instrumentation) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.SetPreTraceMultiThreaded.test_instrumentation) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.SetProfileMultiThreaded.test_instrumentation) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.SetTraceMultiThreaded.test_instrumentation) ... ok

----------------------------------------------------------------------
Ran 5 tests in 453.125s

OK
test_free_threading passed in 7 min 33 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 7 min 33 sec
Total tests: run=5
Total test files: run=1/1
Result: SUCCESS

Linux:

./python -m test -v test_free_threading
== CPython 3.13.0a6+ (heads/main:f8e088df2a, May 2 2024, 12:44:40) [GCC 9.4.0]
== Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31 little-endian
== Python build: debug
== cwd: /home/eclips4/CLionProjects/cpython/build/test_python_worker_27033æ
== CPU count: 16
== encodings: locale=UTF-8 FS=utf-8
== resources: all test resources are disabled, use -u option to unskip tests

Using random seed: 2450947718
0:00:00 load avg: 0.00 Run 1 test sequentially
0:00:00 load avg: 0.00 [1/1] test_free_threading
test_register_callback (test.test_free_threading.test_monitoring.MonitoringMisc.test_register_callback) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.MonitoringMultiThreaded.test_instrumentation) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.SetPreTraceMultiThreaded.test_instrumentation) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.SetProfileMultiThreaded.test_instrumentation) ... ok
test_instrumentation (test.test_free_threading.test_monitoring.SetTraceMultiThreaded.test_instrumentation) ... ok

----------------------------------------------------------------------
Ran 5 tests in 41.137s

OK
test_free_threading passed in 41.2 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 41.2 sec
Total tests: run=5
Total test files: run=1/1
Result: SUCCESS

So, this test on Windows is about 10 times slower, which seems weird to me.

Eclips4 commented 2 weeks ago

FYI, there's a buildbot failure: https://github.com/python/cpython/pull/118438#issuecomment-2090108886