python / cpython

The Python programming language
https://www.python.org
Other
63.65k stars 30.49k forks source link

`PyFrame_GetBack` segfaults if called in a `sys.settrace` hook on `3.11.1` #100536

Closed sumerc closed 1 year ago

sumerc commented 1 year ago

Crash report

When PyFrame_GetBack is called during a sys.settrace hook in a C extension, it segfaults for some libraries(spacy) Following is a reproducer: https://gist.github.com/sumerc/b254f38c5a620b8d47aba7398b3c7791.

Error messages

Enter any relevant error message caused by the crash, including a core dump if there is one.

A gdb stack trace: (for more information: the last frame executed was <frame at 0x7fffd25b2d40, file 'thinc/backends/numpy_ops.pyx', line 1, code __Pyx_PyMODINIT_FUNC PyInit_numpy_ops(void)>) I also have another application that crashes at the same point but with a different frame: <frame at 0x7ff6568e4520, file 'stringsource', line 282, code __init__>. The interesting part is both are Cython functions. Might there be something related with Cython?

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff767a548 in _PyFrame_IsIncomplete (frame=0xcdcdcdcdcdcdcdcd) at ./Include/internal/pycore_frame.h:147
147     ./Include/internal/pycore_frame.h: No such file or directory.
(gdb) bt
#0  0x00007ffff767a548 in _PyFrame_IsIncomplete (frame=0xcdcdcdcdcdcdcdcd) at ./Include/internal/pycore_frame.h:147
#1  0x00007ffff767d9e7 in PyFrame_GetBack (frame=0x7fffd03ef120) at Objects/frameobject.c:1326
#2  0x00007ffff60e7bce in PyTraceFunction (obj=0x0, frame=0x7fffd03ef120, what=0, arg=0x0) at main.c:33
#3  0x00007fffd02db71b in ?? () from /home/supo/.pyenv/versions/3.11.1-debug/lib/python3.11/site-packages/thinc/backends/[numpy_ops.cpython-311-x86_64-linux-gnu.so](http://numpy_ops.cpython-311-x86_64-linux-gnu.so/)
#4  0x00007fffd02d4697 in ?? () from /home/supo/.pyenv/versions/3.11.1-debug/lib/python3.11/site-packages/thinc/backends/[numpy_ops.cpython-311-x86_64-linux-gnu.so](http://numpy_ops.cpython-311-x86_64-linux-gnu.so/)
#5  0x00007ffff76bab22 in PyModule_ExecDef (module=0x7fffd16273b0, def=0x7fffd038ba60) at Objects/moduleobject.c:419
#6  0x00007ffff77f38ec in exec_builtin_or_dynamic (mod=0x7fffd16273b0) at Python/import.c:2333
#7  0x00007ffff77f3ab8 in _imp_exec_dynamic_impl (module=0x7ffff7f83bf0, mod=0x7fffd16273b0) at Python/import.c:2407
#8  0x00007ffff77ef0b0 in _imp_exec_dynamic (module=0x7ffff7f83bf0, mod=0x7fffd16273b0) at Python/clinic/import.c.h:474
#9  0x00007ffff76b9170 in cfunction_vectorcall_O (func=0x7ffff7f882f0, args=0x7fffd03e1ff8, nargsf=1, kwnames=0x0) at Objects/methodobject.c:514
#10 0x00007ffff764bf31 in _PyVectorcall_Call (tstate=0x7ffff7d8ba38 <_PyRuntime+166328>, func=0x7ffff76b9065 <cfunction_vectorcall_O>, callable=0x7ffff7f882f0,
    tuple=0x7fffd03e1fe0, kwargs=0x7fffd03fc4d0) at Objects/call.c:245
#11 0x00007ffff764c2d9 in _PyObject_Call (tstate=0x7ffff7d8ba38 <_PyRuntime+166328>, callable=0x7ffff7f882f0, args=0x7fffd03e1fe0, kwargs=0x7fffd03fc4d0)
    at Objects/call.c:328
#12 0x00007ffff764c3cb in PyObject_Call (callable=0x7ffff7f882f0, args=0x7fffd03e1fe0, kwargs=0x7fffd03fc4d0) at Objects/call.c:355
#13 0x00007ffff77b4158 in do_call_core (tstate=0x7ffff7d8ba38 <_PyRuntime+166328>, func=0x7ffff7f882f0, callargs=0x7fffd03e1fe0, kwdict=0x7fffd03fc4d0,
    use_tracing=255) at Python/ceval.c:7329

Your environment

I have reproduced same error for Python 3.11.0rc1 on Ubuntu 18.04/x86/64 and a Mac M1. (The same code runs fine on 3.9)

Update: Reproduced the segfault with 3.11.1 final release, too.

@pablogsal, Any idea?

pablogsal commented 1 year ago

I can take a look when I am back from holidays.

CC @markshannon @iritkatriel @brandtbucher

pablogsal commented 1 year ago

From the stack trace, this looks like another ownership problem.

pablogsal commented 1 year ago

Meanwhile, could your refine the reproducer so it doesn't use spacy? If is too hard don't worry, we will manage :)

byllyfish commented 1 year ago

This looks like the same bug as #99110 . The seg. fault is tripping over an uninitialized variable. The PR's have been merged.

sumerc commented 1 year ago

could your refine the reproducer so it doesn't use spacy

The only idea that comes to my mind is to call some random Cython functions around to see what happens but could not find time to do it, yet.

This looks like the same bug as https://github.com/python/cpython/issues/99110 . The seg. fault is tripping over an uninitialized variable. The PR's have been merged.

I could not test this as I could not install spacy for the head version. I think we need to find a simpler reproducer.

lsmith77 commented 1 year ago

maybe some from explosion (the company behind spacy) can help https://mastodon.green/@lsmith/109591644106783822

sumerc commented 1 year ago

Ok I have verified this issue is fixed by https://github.com/python/cpython/issues/99110 just as @byllyfish suggested.

I have used the test code in #100182 to reproduce the problem without spacy and then see 3.12 does not reproduce the error.

I have also verified the original spacy issue is fixed by manually backporting the fix to 3.11.1.

lsmith77 commented 1 year ago

Based on https://github.com/python/cpython/commit/57e727af3fda446dc79d65e2d17297d1194892ed I therefore assume it will also be fixed in 3.11.2 to be released in February 2023, correct? https://peps.python.org/pep-0664/

pablogsal commented 1 year ago

3.11.2 to be released in February 2023, correct?

Correct

I am closing this issue as the problem seems to be fixed in main and https://github.com/python/cpython/issues/99110 has been backported to 3.11

pablogsal commented 1 year ago

Backport here: https://github.com/python/cpython/pull/100478