Open 15r10nk opened 2 years ago
The issue is that when "inlining" a pure-Python call like deco
, we must save the frame's instruction pointer to point to the next instruction, which means that f_lasti
will be the code unit prior to that: the CALL
's last CACHE
entry. No other opcode (including non-inlined calls) has this requirement, so f_lasti
should always(?) point to a "real" instruction within their bodies.
Personally, I think the docs you linked to cover this:
Logically, this space is part of the preceding instruction.
So, in my opinion, the correct thing to do here is for tools to scan backwards over any caches when trying to find the last "real" instruction.
I can see why we might want to make this more consistent, though. If others agree, we can look into ways of making that happen, but I'm worried that they'll either come with a performance cost for many types of calls, or be too invasive to justify in a patch release.
I see you updated your comment. I read the original wording as "this is completely wrong".
Not for 3.11, but maybe we could change the INSTRUCTION_START()
macro in ceval.c to something like:
- #define INSTRUCTION_START(op) (frame->prev_instr = next_instr++)
+ #define INSTRUCTION_START(op) do {
+ frame->prev_instr = next_instr;
+ next_instr += (1 + INLINE_CACHE_ENTRIES_ ## op); // fixme for adaptive instructions
+ while (0)
Then we could remove most JUMPBY(INLINE_CACHE_ENTRIES_...)
in ceval.c and reconfigure the compiler to take this into account.
I'm not sure whether the resulting code would look better or worse, but I believe it would make f_lasti
consistently meaningful.
We can work with the current implementation. Our work around already is that we scan backwards to the last instruction.
and will instruct the interpreter to skip over them at runtime.
This confused me and I got the impression that this might be a bug. Maybe the docs could say what should be done if f_lasti points to a CACHE.
Actually, I think I misunderstood, what I wrote would make prev_instr always point to the next instruction.
@carljm had an interesting idea: leave prev_instr
alone when inlining calls, and jump over the caches when resuming with something like:
next_instr += _PyOpcode_Caches[_PyOpcode_Deopt[_Py_OPCODE(next_instr[-1])]];
This should "fix" the issue, but we should definitely measure its performance impact, since it adds quite a bit of logic to every return from an inlined call. I'll experiment with this approach over the next couple of days.
The situation is probably very different following https://github.com/python/cpython/pull/109095. Could you check whether this issue is still relevant?
Indeed, a slightly tweaked example (since dis
doesn't yield CACHE
instructions anymore) correctly shows the CALL
instruction for both examples.
It does still reproduce on 3.12, but I'm not sure that's really worth changing at this point?
The instruction pointer
f_lasti
points under some conditions to a CACHE opcode. This is not completely wrong, because this cache opcode comes after the expectedCALL
opcode, but the documetation says that CACHE opcodes should be skipped. Which it is not in this case.script:
output (Python 3.11.0rc2+):