python / cpython

The Python programming language
https://www.python.org
Other
63.4k stars 30.36k forks source link

The linenumber in the ast changes the result of the bytecode compilation #100378

Closed 15r10nk closed 1 year ago

15r10nk commented 1 year ago

Bug report

I run in some unexpected issue while I was trying to build something.

I expected line numbers have no effect on the python bytecode compilation, but I found something wired which I don't understand and which might be a bug.

compile() generates an JUMP_IF_FALSE_OR_POP instead of one POP_JUMP_IF_FALSE if the line numbers are changed.

script:


import dis
import ast

source="result = a and b or c"

tree=ast.parse(source,"exec")

codea=compile(tree,"<string>","exec")

for i,node in enumerate(ast.walk(tree)):
    node.lineno=i

codeb=compile(tree,"<string>","exec")

print("code a               | code b")

for insta,instb in zip(dis.get_instructions(codea),dis.get_instructions(codeb)):
    print(f"{insta.opname:<20} | {instb.opname}","<-- not equal" if insta.opname!=instb.opname else "")

print("the result seems to be the same")
for a in (True,False):
    for b in (True,False):
        for c in (True,False):
            print()
            print("a:",a,"b:",b,"c:",c)

            eval(codea)
            print("codea:",result)
            eval(codeb)
            print("codeb:",result)

output (Python 3.10.8):

code a               | code b
LOAD_NAME            | LOAD_NAME 
POP_JUMP_IF_FALSE    | JUMP_IF_FALSE_OR_POP <-- not equal
LOAD_NAME            | LOAD_NAME 
JUMP_IF_TRUE_OR_POP  | JUMP_IF_TRUE_OR_POP 
LOAD_NAME            | LOAD_NAME 
STORE_NAME           | STORE_NAME 
LOAD_CONST           | LOAD_CONST 
RETURN_VALUE         | RETURN_VALUE 
the result seems to be the same

a: True b: True c: True
codea: True
codeb: True

a: True b: True c: False
codea: True
codeb: True

a: True b: False c: True
codea: True
codeb: True

a: True b: False c: False
codea: False
codeb: False

a: False b: True c: True
codea: True
codeb: True

a: False b: True c: False
codea: False
codeb: False

a: False b: False c: True
codea: True
codeb: True

a: False b: False c: False
codea: False
codeb: False

I was not able to reproduce this with normal source code, but this does not mean that it is impossible.

Is there any explanation for that behavior?

carljm commented 1 year ago

Thanks for the clear repro script! This still repros in main branch, with the addition of node.end_lineno = i along with node.lineno = i (otherwise it fails with ValueError: AST node line range (2, 1) is not valid.)

In general it is expected (in recent Python versions, at least) that AST node line numbers can change the compiled bytecode, because the compiled bytecode is constrained by the requirements of PEP 626, which requires that tracing of execution should surface all executed lines of code, which in turn requires that there be at least one bytecode present that is shown by the location table to originate in each executed line; sometimes this even requires insertion/preservation of NOP instructions when compiler optimization is able to optimize away all bytecode instructions originating from a certain line of code. So if your question is just "is it expected that this can happen," the answer is just "yes."

I think the particular case you show here is an example of this. The naive compiled bytecode generates a JUMP_IF_FALSE_OR_POP which jumps to a JUMP_IF_TRUE_OR_POP. The compiler optimizes this to eliminate the double test, since the same value can't be both true and false, so it changes the JUMP_IF_FALSE_OR_POP to a POP_JUMP_IF_FALSE and makes it jump past the JUMP_IF_TRUE_OR_POP. This has the same effect, so is a safe optimization.

However, if all the AST nodes have different line numbers, the compiler decides that it cannot make this optimization without disrupting tracing, since the a is False case under tracing would then wrongly seem to bypass the line for the or bool-op (which ends up associated with the JUMP_IF_TRUE_OR_POP opcode.)

15r10nk commented 1 year ago

Big thank you for this detailed explanation. This helps me a lot.