python / cpython

The Python programming language
https://www.python.org/
Other
60.06k stars 29.09k forks source link

Assertion failure with multiple threads & legacy opcode tracing in free-threaded build #118415

Closed chgnrdv closed 2 weeks ago

chgnrdv commented 3 weeks ago

Crash report

What happened?

Bisected to 33da0e844c922b3dcded75fbb9b7be67cb013a17.

import sys
import threading

def trace(frame, event, arg):
    frame.f_trace_opcodes = True
    return trace

sys.settrace(trace)

l = threading._RLock()

def f():
    while True:
        with l:
            pass

t = threading.Thread(target=f)
t.start()
while True:
    with l:
        pass
t.join()
$ ./python -X faulthandler -X gil=0 repro.py
python: Python/instrumentation.c:1216: _Py_call_instrumentation_line: Assertion `is_version_up_to_date(code, tstate->interp)' failed.
Fatal Python error: Aborted

Current thread 0x00007f7bc3712700 (most recent call first):
  File "/home/radislav/projects/cpython/Lib/threading.py", line 209 in acquire
  File "/home/radislav/projects/cpython/repro.py", line 14 in f
  File "/home/radislav/projects/cpython/Lib/threading.py", line 990 in run
  File "/home/radislav/projects/cpython/Lib/threading.py", line 1039 in _bootstrap_inner
  File "/home/radislav/projects/cpython/Lib/threading.py", line 1010 in _bootstrap

Thread 0x00007f7bc39fb280 (most recent call first):
  File "/home/radislav/projects/cpython/Lib/threading.py", line 206 in acquire
  File "/home/radislav/projects/cpython/repro.py", line 20 in <module>
Aborted (core dumped)

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.0a6+ (heads/main:5a90de0d4c, Apr 30 2024, 00:01:27) [GCC 10.2.1 20210110]

Linked PRs

colesbury commented 3 weeks ago

cc @DinoV

colesbury commented 2 weeks ago

In the macOS nogil buildbot there is a slightly different assertion error that may be due to the same underlying issue:

test_instrumentation (test.test_free_threading.test_monitoring.MonitoringMultiThreaded.test_instrumentation) ... Assertion failed: (instrumentation_cross_checks(interp, code)), function get_tools_for_instruction, file instrumentation.c, line 1022.
Fatal Python error: Aborted
Thread 0x0000000171d0f000 (most recent call first):
  File "/Users/ec2-user/buildbot/buildarea/3.x.itamaro-macos-arm64-aws.macos-with-brew.refleak.nogil/build/Lib/test/test_free_threading/test_monitoring.py", line 134 in callback
  File "/Users/ec2-user/buildbot/buildarea/3.x.itamaro-macos-arm64-aws.macos-with-brew.refleak.nogil/build/Lib/test/test_free_threading/test_monitoring.py", line 37Assertion failed: (instrumentation_cross_checks(tstate->interp, code)), function _Py_call_instrumentation_line, file instrumentation.c, line 1221.
 in
1 test failed again:
    test_free_threading

https://buildbot.python.org/all/#/builders/1368/builds/894/steps/5/logs/stdio

chgnrdv commented 2 weeks ago

@colesbury there are multiple assertion errors that can occur with this repro, including the one from buildbot.

Sometimes Python segfaults/aborts in faulthandler when trying to dump traceback. I'm not sure if it's a different issue or just Python being in some invalid state because of instrumentation version mismatch.

python: Python/instrumentation.c:1344: _Py_call_instrumentation_instruction: Assertion `is_version_up_to_date(code, tstate->interp)' failed.

Thread 1 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) c
Continuing.
Fatal Python error: Aborted

Thread 0x00007ffff7979700 (most recent call first):
  File "/home/radislav/projects/cpython/Lib/threading.py", line 208 in acquire

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00005555558c810e in _PyFrame_GetCode (f=0x7ffff79789c0) at ./Include/object.h:334
334     return ob->ob_type;
(gdb) bt
#0  0x00005555558c810e in _PyFrame_GetCode (f=0x7ffff79789c0) at ./Include/object.h:334
#1  dump_frame (fd=fd@entry=2, frame=frame@entry=0x7ffff79789c0) at Python/traceback.c:893
#2  0x00005555558c8346 in dump_traceback (fd=fd@entry=2, tstate=tstate@entry=0x555555ca12c0, write_header=write_header@entry=0) at Python/traceback.c:974
#3  0x00005555558c8496 in _Py_DumpTracebackThreads (fd=fd@entry=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0x555555c11a78 <_PyRuntime+321656>)
    at Python/traceback.c:1090
#4  0x00005555558db32a in faulthandler_dump_traceback (fd=fd@entry=2, all_threads=1, interp=<optimized out>) at ./Modules/faulthandler.c:205
#5  0x00005555558db63b in faulthandler_fatal_error (signum=6) at ./Modules/faulthandler.c:323
#6  <signal handler called>
#7  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#8  0x00007ffff7c85537 in __GI_abort () at abort.c:79
#9  0x00007ffff7c8540f in __assert_fail_base (fmt=0x7ffff7dfc688 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x555555a10480 "is_version_up_to_date(code, tstate->interp)", file=0x555555a0f8f8 "Python/instrumentation.c", line=1344, function=<optimized out>)
    at assert.c:92
#10 0x00007ffff7c94662 in __GI___assert_fail (assertion=assertion@entry=0x555555a10480 "is_version_up_to_date(code, tstate->interp)", 
    file=file@entry=0x555555a0f8f8 "Python/instrumentation.c", line=line@entry=1344, 
    function=function@entry=0x555555a10a80 <__PRETTY_FUNCTION__.24> "_Py_call_instrumentation_instruction") at assert.c:101
#11 0x00005555558859b0 in _Py_call_instrumentation_instruction (tstate=tstate@entry=0x555555c11a78 <_PyRuntime+321656>, frame=frame@entry=0x7ffff7fc1098, 
    instr=instr@entry=0x200003c7f16) at Python/instrumentation.c:1344
#12 0x000055555581ae86 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555c11a78 <_PyRuntime+321656>, frame=0x7ffff7fc1098, throwflag=throwflag@entry=0)
    at Python/generated_cases.c.h:3164
#13 0x00005555558267d7 in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555c11a78 <_PyRuntime+321656>) at ./Include/internal/pycore_ceval.h:118
#14 _PyEval_Vector (tstate=<optimized out>, func=<optimized out>, locals=locals@entry=0x0, args=0x7fffffffdae8, argcount=1, kwnames=0x0) at Python/ceval.c:1814
#15 0x000055555568012c in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:413
#16 0x0000555555684b6c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7fffffffdae8, callable=0x200009815d0, tstate=0x555555c11a78 <_PyRuntime+321656>)
    at ./Include/internal/pycore_call.h:168
#17 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:70
#18 0x0000555555680a30 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=0, args=0x0, callable=0x2000034c070, tstate=0x555555c11a78 <_PyRuntime+321656>)
    at ./Include/internal/pycore_call.h:168
#19 PyObject_CallNoArgs (func=func@entry=0x2000034c070) at Objects/call.c:106
#20 0x000055555580f5f9 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555c11a78 <_PyRuntime+321656>, frame=0x7ffff7fc1020, throwflag=throwflag@entry=0)
    at Python/generated_cases.c.h:89
#21 0x00005555558267d7 in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555c11a78 <_PyRuntime+321656>) at ./Include/internal/pycore_ceval.h:118
#22 _PyEval_Vector (tstate=tstate@entry=0x555555c11a78 <_PyRuntime+321656>, func=func@entry=0x20000980610, locals=locals@entry=0x20000746070, args=args@entry=0x0, 
    argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:1814
#23 0x0000555555826885 in PyEval_EvalCode (co=co@entry=0x200003e3010, globals=globals@entry=0x20000746070, locals=locals@entry=0x20000746070) at Python/ceval.c:602
#24 0x00005555558aa671 in run_eval_code_obj (tstate=tstate@entry=0x555555c11a78 <_PyRuntime+321656>, co=co@entry=0x200003e3010, globals=globals@entry=0x20000746070, 
    locals=locals@entry=0x20000746070) at Python/pythonrun.c:1291
--Type <RET> for more, q to quit, c to continue without paging--
#25 0x00005555558ac843 in run_mod (mod=mod@entry=0x200009957f8, filename=filename@entry=0x20000aa1870, globals=globals@entry=0x20000746070, 
    locals=locals@entry=0x20000746070, flags=flags@entry=0x7fffffffdf78, arena=arena@entry=0x20000031bd0, interactive_src=<optimized out>, 
    generate_new_source=<optimized out>) at Python/pythonrun.c:1376
#26 0x00005555558acbc6 in pyrun_file (fp=fp@entry=0x555555c9eed0, filename=filename@entry=0x20000aa1870, start=start@entry=257, globals=globals@entry=0x20000746070, 
    locals=locals@entry=0x20000746070, closeit=closeit@entry=1, flags=0x7fffffffdf78) at Python/pythonrun.c:1212
#27 0x00005555558ae021 in _PyRun_SimpleFileObject (fp=fp@entry=0x555555c9eed0, filename=filename@entry=0x20000aa1870, closeit=closeit@entry=1, 
    flags=flags@entry=0x7fffffffdf78) at Python/pythonrun.c:461
#28 0x00005555558ae3a6 in _PyRun_AnyFileObject (fp=fp@entry=0x555555c9eed0, filename=filename@entry=0x20000aa1870, closeit=closeit@entry=1, 
    flags=flags@entry=0x7fffffffdf78) at Python/pythonrun.c:77
#29 0x00005555558d72c6 in pymain_run_file_obj (program_name=program_name@entry=0x20000960890, filename=filename@entry=0x20000aa1870, skip_source_first_line=0)
    at Modules/main.c:357
#30 0x00005555558d759d in pymain_run_file (config=config@entry=0x555555be3d58 <_PyRuntime+133976>) at Modules/main.c:376
#31 0x00005555558d8c62 in pymain_run_python (exitcode=exitcode@entry=0x7fffffffe0fc) at Modules/main.c:628
#32 0x00005555558d8cd0 in Py_RunMain () at Modules/main.c:707
#33 0x00005555558d8d24 in pymain_main (args=args@entry=0x7fffffffe140) at Modules/main.c:737
#34 0x00005555558d8d99 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:761
#35 0x00005555555d37de in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:15
python: Python/instrumentation.c:1220: _Py_call_instrumentation_line: Assertion `is_version_up_to_date(code, tstate->interp)' failed.

Thread 2 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7979700 (LWP 1262946)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) c
Continuing.
Fatal Python error: Aborted

Current thread 0x00007ffff7979700 (most recent call first):
  File "/home/radislav/projects/cpython/Lib/threading.py", line 203 in acquire
  File "/home/radislav/projects/cpython/crash5.1.py", line 16 in f
  File "/home/radislav/projects/cpython/Lib/threading.py", line 990 in run
  File "/home/radislav/projects/cpython/Lib/threading.py", line 1039 in _bootstrap_inner
  File "/home/radislav/projects/cpython/Lib/threading.py", line 1010 in _bootstrap

Thread 0x00007ffff7c62280 (most recent call first):
  File "/home/radislav/projects/cpython/crash5.1.py"python: ./Include/internal/pycore_frame.h:77: _PyFrame_GetCode: Assertion `PyCode_Check(f->f_executable)' failed.

Thread 2 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  in ../sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7c85537 in __GI_abort () at abort.c:79
#2  0x00007ffff7c8540f in __assert_fail_base (fmt=0x7ffff7dfc688 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x555555999b57 "PyCode_Check(f->f_executable)", 
    file=0x555555999ed8 "./Include/internal/pycore_frame.h", line=77, function=<optimized out>) at assert.c:92
#3  0x00007ffff7c94662 in __GI___assert_fail (assertion=assertion@entry=0x555555999b57 "PyCode_Check(f->f_executable)", 
    file=file@entry=0x555555999ed8 "./Include/internal/pycore_frame.h", line=line@entry=77, 
    function=function@entry=0x555555a060f0 <__PRETTY_FUNCTION__.7> "_PyFrame_GetCode") at assert.c:101
#4  0x000055555585a93e in _PyFrame_GetCode (f=0x7ffff7fc1180) at ./Include/internal/pycore_frame.h:77
#5  PyUnstable_InterpreterFrame_GetLine (frame=frame@entry=0x7ffff7fc1180) at Python/frame.c:152
#6  0x00005555558c8166 in dump_frame (fd=fd@entry=2, frame=frame@entry=0x7ffff7fc1180) at Python/traceback.c:905
#7  0x00005555558c8346 in dump_traceback (fd=fd@entry=2, tstate=tstate@entry=0x555555c11a78 <_PyRuntime+321656>, write_header=write_header@entry=0)
    at Python/traceback.c:974
#8  0x00005555558c8496 in _Py_DumpTracebackThreads (fd=fd@entry=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0x555555ca12c0) at Python/traceback.c:1090
#9  0x00005555558db32a in faulthandler_dump_traceback (fd=fd@entry=2, all_threads=1, interp=<optimized out>) at ./Modules/faulthandler.c:205
#10 0x00005555558db63b in faulthandler_fatal_error (signum=6) at ./Modules/faulthandler.c:323
#11 <signal handler called>
#12 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#13 0x00007ffff7c85537 in __GI_abort () at abort.c:79
#14 0x00007ffff7c8540f in __assert_fail_base (fmt=0x7ffff7dfc688 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x555555a10480 "is_version_up_to_date(code, tstate->interp)", file=0x555555a0f8f8 "Python/instrumentation.c", line=1220, function=<optimized out>)
    at assert.c:92
#15 0x00007ffff7c94662 in __GI___assert_fail (assertion=assertion@entry=0x555555a10480 "is_version_up_to_date(code, tstate->interp)", 
    file=file@entry=0x555555a0f8f8 "Python/instrumentation.c", line=line@entry=1220, 
    function=function@entry=0x555555a10b10 <__PRETTY_FUNCTION__.28> "_Py_call_instrumentation_line") at assert.c:101
#16 0x00005555558850d9 in _Py_call_instrumentation_line (tstate=tstate@entry=0x555555ca12c0, frame=frame@entry=0x7ffff7fbd200, instr=instr@entry=0x200003c7efe, 
    prev=0x200003c7efc) at Python/instrumentation.c:1220
#17 0x00005555558257eb in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ca12c0, frame=0x7ffff7fbd200, throwflag=throwflag@entry=0) at Python/ceval.c:812
#18 0x00005555558267d7 in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555ca12c0) at ./Include/internal/pycore_ceval.h:118
#19 _PyEval_Vector (tstate=<optimized out>, func=<optimized out>, locals=locals@entry=0x0, args=0x7ffff7978af8, argcount=1, kwnames=0x0) at Python/ceval.c:1814
#20 0x000055555568012c in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:413
#21 0x0000555555684b6c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7ffff7978af8, callable=0x200009815d0, tstate=0x555555ca12c0)
    at ./Include/internal/pycore_call.h:168
#22 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:70
#23 0x0000555555680a30 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=0, args=0x0, callable=0x20002070550, tstate=0x555555ca12c0)
    at ./Include/internal/pycore_call.h:168
#24 PyObject_CallNoArgs (func=func@entry=0x20002070550) at Objects/call.c:106
#25 0x000055555580f5f9 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ca12c0, frame=0x7ffff7fbd188, throwflag=throwflag@entry=0)
--Type <RET> for more, q to quit, c to continue without paging--
    at Python/generated_cases.c.h:89
#26 0x00005555558267d7 in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555ca12c0) at ./Include/internal/pycore_ceval.h:118
#27 _PyEval_Vector (tstate=<optimized out>, func=<optimized out>, locals=locals@entry=0x0, args=0x7ffff7978da8, argcount=1, kwnames=0x0) at Python/ceval.c:1814
#28 0x000055555568012c in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:413
#29 0x0000555555684b6c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7ffff7978da8, callable=0x20000988b90, tstate=0x555555ca12c0)
    at ./Include/internal/pycore_call.h:168
#30 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:70
#31 0x0000555555682a68 in _PyVectorcall_Call (tstate=tstate@entry=0x555555ca12c0, func=0x5555556848a1 <method_vectorcall>, callable=callable@entry=0x2000034c250, 
    tuple=tuple@entry=0x555555be1d48 <_PyRuntime+125768>, kwargs=kwargs@entry=0x0) at Objects/call.c:273
#32 0x0000555555682e14 in _PyObject_Call (tstate=0x555555ca12c0, callable=0x2000034c250, args=0x555555be1d48 <_PyRuntime+125768>, kwargs=0x0) at Objects/call.c:348
#33 0x0000555555682e6d in PyObject_Call (callable=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at Objects/call.c:373
#34 0x00005555559572e8 in thread_run (boot_raw=boot_raw@entry=0x555555c8d6f0) at ./Modules/_threadmodule.c:337
#35 0x00005555558c4773 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:241
#36 0x00007ffff7f8dea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#37 0x00007ffff7d5ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
DinoV commented 2 weeks ago

Yep, I was seeing the different variations when working on #118496. The assertions all need locks to see a consistent world view with the updates that are happening to the local events. Races against them should be relatively benign as they're asserting that we have a consistent set of events between the active monitors and the local monitors. If those race we're just delivering events longer or not long enough.

But we could also hit another assertion about assert(next_opcode != 0);, and that's the bulk of the fix. Currently the instrumentation publishes and unpublished the original non-optimized opcode as events are enabled and disabled, and that race is not benign. So now we always keep the original non-optimized opcodes aronud.

colesbury commented 2 weeks ago

This is fixed now