python / cpython

The Python programming language
https://www.python.org
Other
63.85k stars 30.56k forks source link

SIGSEGV with generators after direct change in gi_frame #125038

Open efimov-mikhail opened 1 month ago

efimov-mikhail commented 1 month ago

Crash report

What happened?

There is a segmentation fault with simple code snippet:

g = (x for x in range(10))
g.gi_frame.f_locals['.0'] = range(20)
list(g)
print("No segfault")

There is a SIGSEGV on my linux machine (Debian GNU/Linux 10) with both main branch Python and 3.13 version. Message "No segfault" is printed on Python 3.7.3 (default, Oct 31 2022, 14:04:00).

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a0 (heads/peg_parser_remove_redundant_functions:e1b62e5cf79, Oct 3 2024, 14:38:46) [GCC 8.3.0]

Linked PRs

JelleZijlstra commented 1 month ago

Backtrace on MacOS:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00000001001d0c5c python.exe`_PyEval_EvalFrameDefault(tstate=<unavailable>, frame=0x0000000100bcf898, throwflag=0) at generated_cases.c.h:3628:36 [opt]
    frame #2: 0x00000001000a48e0 python.exe`gen_send_ex2 [inlined] _PyEval_EvalFrame(tstate=0x00000001005a8738, frame=<unavailable>, throwflag=0) at pycore_ceval.h:119:16 [opt]
    frame #3: 0x00000001000a48d4 python.exe`gen_send_ex2(gen=0x0000000100bcf850, arg=0x0000000000000000, presult=0x000000016fdfe378, exc=0, closing=<unavailable>) at genobject.c:245:24 [opt]
    frame #4: 0x00000001000a2f28 python.exe`gen_iternext(self=<unavailable>) at genobject.c:612:9 [opt]
    frame #5: 0x00000001000ba0a4 python.exe`list_extend_iter_lock_held(self=0x00000001010ca490, iterable=0x00000001010ca490) at listobject.c:1194:26 [opt]
    frame #6: 0x00000001000b6fb8 python.exe`_list_extend(self=<unavailable>, iterable=<unavailable>) at listobject.c:1367:15 [opt] [artificial]
    frame #7: 0x00000001000bd578 python.exe`list___init___impl(self=0x00000001010ca490, iterable=0x0000000100bcf850) at listobject.c:3391:13 [opt]
    frame #8: 0x00000001000b8ef0 python.exe`list_vectorcall(type=<unavailable>, args=0x00000001009dc690, nargsf=<unavailable>, kwnames=<unavailable>) at listobject.c:3415:13 [opt]
    frame #9: 0x00000001000855f8 python.exe`_PyObject_VectorcallTstate(tstate=0x00000001005a8738, callable=0x0000000100530f30, args=0x00000001009dc690, nargsf=9223372036854775809, kwnames=0x0000000000000000) at pycore_call.h:167:11 [opt]

The crash is in the FOR_ITER instruction; it blindly accesses tp_iternext without verifying that the object being iterated over has this attribute.

The repro case is a bit dubious as it's mutating f_locals, but probably we should indeed support this. The fix would be as simple as inserting a NULL check.

efimov-mikhail commented 1 month ago

Thank you for such a concrete comment! It seems like I made fix by myself after it.

markshannon commented 1 month ago

I think there are two reasonable approaches to fixing this:

  1. Strengthen the compiler/interpreter
  2. Prohibit setting non-identifier keys in the underlying frame

Given that setting an internal variable like .0 is obviously a suspect thing to do, I'd be fine with (2). However, it might break code that stores additional, possibly non-identifier, keys in the locals() dict.

markshannon commented 1 month ago

I can segfault 3.12 and earlier with a variant of this.

g = (x for x in range(10))
gen_expr_func = types.FunctionType(g.gi_code, {})
list(gen_expr_func(range(20)))
print("No segfault")

The root cause is the same: the generator expression does not check that the value passed to it is an iterator.

markshannon commented 1 month ago

So it looks like (1) is the correct fix. See https://github.com/python/cpython/pull/125178#issuecomment-2402332695 for a way to do this.

efimov-mikhail commented 1 month ago

IMHO, additional GET_ITER before FOR_ITER for genexpr is a best way to prevent all such crashes. I will try to provide code changes by myself.

JelleZijlstra commented 1 month ago

The initial fix is merged into main, though we'll want to eventually implement a different fix.

I thought it was worth investigating whether a similar crash was possible with async genexps. However, it seems that the GET_ANEXT impression is implemented more robustly and I couldn't find a crash by manipulating the locals of an asyncgen frame.

efimov-mikhail commented 1 month ago

However, it seems that the GET_ANEXT impression is implemented more robustly and I couldn't find a crash by manipulating the locals of an asyncgen frame.

Yes, it seems so. Actually, function _PyEval_GetANext from Python/ceval.c contains NULL check for type->tp_as_async->am_anext method. TypeError will be raised if no such method presents. This code looks very similar to one of the previous variants of fixing crash in my PR.

Maybe, we can remove NULL check from the GET_ANEXT bytecode implementation. And place GET_AITER instruction before GET_ANEXT, in similarity with "sync" generators. This change could possible improve perfomance a little bit. But I'm not sure that it's worth it.

On the other side, TypeErrors in GET_AITER and GET_ANEXT slightly differs in case of am_anext == NULL. Maybe we should use the same text in those functions.

cc @JelleZijlstra, @markshannon

ncoghlan commented 3 weeks ago

See #125723 for another generator f_locals related crash that may be confounding the investigation attempts here (I don't think it's a duplicate, but it's a segfault where we have no idea of the root cause, so that assessment is purely based on the reproducer having a very different structure from the ones described here)