python / cpython

The Python programming language
https://www.python.org
Other
63.89k stars 30.58k forks source link

`f_lasti` points to `CACHE` opcode during inlined `CALL` #96970

Open 15r10nk opened 2 years ago

15r10nk commented 2 years ago

The instruction pointer f_lasti points under some conditions to a CACHE opcode. This is not completely wrong, because this cache opcode comes after the expected CALL opcode, but the documetation says that CACHE opcodes should be skipped. Which it is not in this case.

script:

import inspect
import dis

frame = inspect.currentframe()

class Tester:
    def __call__(self, f):
        deco(f)

tester = Tester()

def deco(f):
    assert f.__name__ == "foo"
    instructions = list(dis.get_instructions(frame.f_code, show_caches=True))
    opname = instructions[frame.f_lasti // 2].opname
    print(f"{frame.f_lasti = }\n{opname = }")
    assert opname == "CALL", opname

print("correct")

@tester
def foo():
    pass

print()
print("incorrect")

@deco
def foo():
    pass

output (Python 3.11.0rc2+):

correct
frame.f_lasti = 132
opname = 'CALL'

incorrect
frame.f_lasti = 204
opname = 'CACHE'
Traceback (most recent call last):
  File "/home/frank/projects/executing/example.py", line 35, in <module>
    @deco
     ^^^^
  File "/home/frank/projects/executing/example.py", line 20, in deco
    assert opname == "CALL", opname
AssertionError: CACHE
brandtbucher commented 2 years ago

The issue is that when "inlining" a pure-Python call like deco, we must save the frame's instruction pointer to point to the next instruction, which means that f_lasti will be the code unit prior to that: the CALL's last CACHE entry. No other opcode (including non-inlined calls) has this requirement, so f_lasti should always(?) point to a "real" instruction within their bodies.

Personally, I think the docs you linked to cover this:

Logically, this space is part of the preceding instruction.

So, in my opinion, the correct thing to do here is for tools to scan backwards over any caches when trying to find the last "real" instruction.

I can see why we might want to make this more consistent, though. If others agree, we can look into ways of making that happen, but I'm worried that they'll either come with a performance cost for many types of calls, or be too invasive to justify in a patch release.

brandtbucher commented 2 years ago

I see you updated your comment. I read the original wording as "this is completely wrong".

sweeneyde commented 2 years ago

Not for 3.11, but maybe we could change the INSTRUCTION_START() macro in ceval.c to something like:

- #define INSTRUCTION_START(op) (frame->prev_instr = next_instr++)
+ #define INSTRUCTION_START(op) do {
+     frame->prev_instr = next_instr;
+     next_instr += (1 + INLINE_CACHE_ENTRIES_ ## op); // fixme for adaptive instructions
+ while (0)

Then we could remove most JUMPBY(INLINE_CACHE_ENTRIES_...) in ceval.c and reconfigure the compiler to take this into account.

I'm not sure whether the resulting code would look better or worse, but I believe it would make f_lasti consistently meaningful.

15r10nk commented 2 years ago

We can work with the current implementation. Our work around already is that we scan backwards to the last instruction.

and will instruct the interpreter to skip over them at runtime.

This confused me and I got the impression that this might be a bug. Maybe the docs could say what should be done if f_lasti points to a CACHE.

sweeneyde commented 2 years ago

Actually, I think I misunderstood, what I wrote would make prev_instr always point to the next instruction.

brandtbucher commented 2 years ago

@carljm had an interesting idea: leave prev_instr alone when inlining calls, and jump over the caches when resuming with something like:

next_instr += _PyOpcode_Caches[_PyOpcode_Deopt[_Py_OPCODE(next_instr[-1])]];

This should "fix" the issue, but we should definitely measure its performance impact, since it adds quite a bit of logic to every return from an inlined call. I'll experiment with this approach over the next couple of days.

iritkatriel commented 6 months ago

The situation is probably very different following https://github.com/python/cpython/pull/109095. Could you check whether this issue is still relevant?

brandtbucher commented 6 months ago

Indeed, a slightly tweaked example (since dis doesn't yield CACHE instructions anymore) correctly shows the CALL instruction for both examples.

It does still reproduce on 3.12, but I'm not sure that's really worth changing at this point?