python-greenlet / greenlet

Lightweight in-process concurrent programming
Other
1.63k stars 247 forks source link

Can't access full stack of suspended greenlet on 3.12 #388

Closed oremanj closed 8 months ago

oremanj commented 9 months ago

The gr_frame attribute refers to the innermost frame (the frame that called switch()), but its f_back attribute points nowhere.

% python3.12
Python 3.12.0 (v3.12.0:0fb18b02c8, Oct  2 2023, 09:45:56) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import greenlet
>>> def inner(): greenlet.getcurrent().parent.switch(42)
...
>>> def outer(): inner()
...
>>> gr = greenlet.greenlet(outer)
>>> gr.switch()
42
>>> gr.gr_frame
<frame at 0x10438b7e0, file '<stdin>', line 1, code inner>
>>> gr.gr_frame.f_back
>>> greenlet.__version__
'3.0.1'
>>>

On 3.11 I see instead:

>>> gr.gr_frame
<frame at 0x7feb24db2700, file '<stdin>', line 1, code inner>
>>> gr.gr_frame.f_back
<frame at 0x7feb24db2660, file '<stdin>', line 1, code outer>
>>> gr.gr_frame.f_back.f_back
>>>

which is what I would expect from the documentation. This looks related to some logic in TPythonState.cpp that moves the previous ptr into greenlet-local storage when suspending:

    if (frame) {
        this->_prev_frame = frame->f_frame->previous;
        frame->f_frame->previous = nullptr;
    }

How is greenlet introspection supposed to work on 3.12?

oremanj commented 9 months ago

OK, I dug into this a bit more. On 3.12, the linked list of PyInterpreterFrames contains some entries on the C stack (one per C-level call to the frame evaluation function) as a performance optimization to avoid needing to check on every Python return whether a C return is necessary. Switching greenlets spills portions of the stack to the heap, but we don't update the PyInterpreterFrame linked list to point to the new location (and in general there's nothing to prevent a single PyInterpreterFrame from being split across the heap/stack boundary, either). greenlet clears out the previous field of the top frame in a suspended greenlet so that walking f_back produces NULL instead of accessing nonsense, but that makes it impossible to walk the greenlet stack.

It's worth noting that the current approach is not sufficient to prevent crashes, because it's still possible to get a reference to a frame that is not the innermost at the time of its greenlet's suspension:

% python
Python 3.12.0 (v3.12.0:0fb18b02c8, Oct  2 2023, 09:45:56) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, greenlet
>>> def outer():
...   global frame
...   frame = sys._getframe(0)
...   inner()
...
>>> def inner():
...   greenlet.getcurrent().parent.switch(42)
...
>>> gr = greenlet.greenlet(outer)
>>> gr.switch()
42
>>> frame.f_back
zsh: segmentation fault  python

I've developed a patch that provides a gr.gr_stack attribute (which returns the full list of frames in the greenlet) that works on 3.12 by mapping addresses in the saved portion of the stack to the heap copy. However, it still has the problem that the returned other-than-top frames can crash the interpreter if you access their f_back attributes.

If we want to totally prevent this crash, I think we would need to walk the frame list on every greenlet suspension. There are three options I see for doing that:

Obviously none of these is particularly appealing from a performance perspective. To mitigate that impact, any of them could also be done from the gr_frame accessor (and set a flag to remember to undo the change the next time the greenlet is resumed) if we're not worried about frame objects that leak out of the greenlet through other means.

Thoughts?

jamadden commented 9 months ago

Thanks for your report and detailed analysis! It's much appreciated. What follows are some uncoordinated, half-baked thoughts I'm tossing out in between meetings:

I suspect at least some of the problem predates 3.12; 3.11 is when CPython started storing frames on the C stack (IIRC; it could have been 3.10).

I'm looking to reduce switching overhead, not increase it. We've fallen behind in that area in recent versions of CPython.

If the potential for crashing predates 3.12 (I'm not totally sure there off the top of my head), I'm not recalling any crash reports related to accessing previously captured frames from switched out greenlets, so that doesn't seem to be a common pattern.

Unfortunately for us, the direction CPython development is going looks to be imposing more and more limits on what greenlet can do in the name of interpreter performance (which is a perfectly reasonable tradeoff!); for example, tracing is of limited utility under 3.12 because the control information for that too has moved to inaccessible places. So I'm afraid we're going to have to get used to compromises. Or just wait for the no-GIL version of CPython 3.13 and switch to threads 😄

That all in mind, I think I'd be more inclined to do something like make gr_frame produce an AttributeError on those versions of Python where it isn't supported than try to introduce heroic workarounds that tie us deeper into the changing implementation details of CPython.

oremanj commented 8 months ago

I suspect at least some of the problem predates 3.12; 3.11 is when CPython started storing frames on the C stack (IIRC; it could have been 3.10).

No, this is new in 3.12. The PyInterpreterFrame structure existed in 3.11 but none of them were on the C stack; in 3.11 all interpreter frames were contained either in a generator object, frame object, or on the per-thread frame stack. The change that created problems for us is https://github.com/python/cpython/pull/96319 and its rationale is described (lightly) in https://github.com/python/cpython/issues/96421 .

I'm looking to reduce switching overhead, not increase it. We've fallen behind in that area in recent versions of CPython.

Sure, that makes sense to me.

I think I'd be more inclined to do something like make gr_frame produce an AttributeError on those versions of Python where it isn't supported

I think that not being able to introspect a suspended greenlet's stack is a pretty major usability problem for debugging tools. Would you be willing to consider a solution where the ugly stuff is in the gr_frame accessor and there's basically no performance penalty if gr_frame isn't used? I think I'd be able to implement that. A gr_frame getter would need to modify the saved greenlet state in order to make the returned frame objects valid to access from outside of the greenlet, and it would need to set a flag to remember to reverse those modifications when the greenlet is next resumed. We could document that the returned frame objects are only safe to access until the greenlet is next resumed.

than try to introduce heroic workarounds that tie us deeper into the changing implementation details of CPython.

I think that ship has sailed, honestly -- the greenlet package is tied to CPython implementation details for better or for worse. It's reasonable to decide that some feature is no longer supportable, at which point I agree that it's better to fail cleanly than to produce an incorrect result; but I don't think we're at that point yet wrt frame extraction.

There might also be a palatable upstream change that would restore greenlet's ability to extract a suspended greenlet's stack. I'll file an issue about that and see what the CPython maintainers say.

oremanj commented 8 months ago

No response from CPython, but I think we can solve this satisfactorily on our end - see attached PR. It's actually slightly faster than the 3.12 status quo if you don't access gr_frame.