Open ericsnowcurrently opened 2 years ago
Some possible "solutions":
stack_pointer
to check synchronization_PyFrame_GetStackPointer()
directly insteadstack_pointer = _PyFrame_SetStackPointer(frame);
right before every call/use of stack_pointer
(which makes the local irrelevant)_PyFrame_SetStackPointer()
, etc. to the local variable somehow (unrealistic)For starters, it would be nice to properly define the invariant. There are obviously cases when stack_pointer
goes out of sync with frame->stacktop
, and currently it's not clear if they are intentional – and if so, when the eventual sync is expected.
Same goes for next_instr
/frame->prev_instr
(and anything else that'll be made local in the future).
The invariant is well defined.
For years, the invariant has been that if the stack-pointer stored in the frame was invalid (used to be NULL
, now its -1
) then the stack pointer was being handled in the interpreter, _PyEval_EvalFrameDefault
.
https://github.com/python/cpython/blob/main/Python/ceval.c#L1706 explains why the in-memory stack offset is made invalid when the stack pointer is held in a register.
There are obviously cases when stack_pointer goes out of sync with frame->stacktop
When are they out of sync?
I should say that the stack pointer is (along with the instruction pointer) the hottest variable int the entire VM and needs to be in a register. No harm in adding more asserts, though.
When are they out of sync?
See #94215 (and my comment at https://github.com/python/cpython/issues/94215#issuecomment-1178073348).
Where in the code?
They can only be out of sync when both are valid and reachable.
In other words, it can only happen in _PyEval_EvalFrameDefault
when frame->stacktop >= 0
.
That should only happen in a few places dealing with tracing and the like.
Looking at the code, it appears that we don't set frame->stacktop = -1;
after TRACE_FUNCTION_ENTRY();
. I don't know if this is intentional or not, but it seems incorrect to me.
While considering the situation of gh-94215, I realized there is a broader concern worth discussing, of which that issue is a specific instance. FWIW, this isn't some massive, pervasive, nor high-priority problem that needs immediate attention. It is worth consideration at some point, though, as it's bitten us already and I expect it will bite us again.
Context:
In the eval loop (in ceval.c), we copy some interpreter/thread/frame state into local variables, to reduce overhead a little. Of these variables, several get modified during execution. One of them,
stack_pointer
, must be kept in sync with theframe->stacktop
, which can be modified via_PyFrame_SetStackPointer()
,_PyFrame_StackPop()
, and_PyFrame_StackPush()
. We passstack_pointer
in various calls (e.g. vectorcall) throughout the eval loop, so it is important that it stay in sync, at least at the points where we use it. (I covered this a bit more in the other issue.)The Concern:
From where I'm at, it's hard to have a lot of confidence that we are preserving the invaraiant (that
stack_pointer
is properly in sync with the frame state) in all cases. There doesn't seem to be any structural mechanism by which we get guarantees (relative to use of_PyFrame_SetStackPointer()
, etc.). Instead it seems like knowing when to synchronizestack_pointer
, and when we can avoid unnecessary synchronization, currently requires deep knowledge of a lot of pieces.Consequently, it seems like it is relatively easy to break the invariant accidentally. The fact that it happened to someone as knowledgeable as @markshannon really says a lot.
Am I out of touch here? If not, is it worth doing something about all that?
CC @iritkatriel @tiran @markshannon @pablogsal