python / cpython

The Python programming language
https://www.python.org
Other
63.31k stars 30.31k forks source link

Remove `LOAD_CLOSURE` #105775

Closed markshannon closed 1 year ago

markshannon commented 1 year ago

LOAD_CLOSURE is identical to LOAD_FAST_CHECK in every way except its name and number.

The justification for its existence is that "We keep LOAD_CLOSURE so that the bytecode stays more readable.". Which is insufficient justification to keep it given that it:

Linked PRs

heysujal commented 1 year ago

I'd like to work on this issue

markshannon commented 1 year ago

Are you sure? This is more complicated than it looks as it involves some awkward compiler internals.

heysujal commented 1 year ago

Oh, I thought it would be just deleting some statements

markshannon commented 1 year ago

It should be. But the compiler handles local variables in a somewhat convoluted way

heysujal commented 1 year ago

Ok, then someone else can pick the issue up.

polynomialherder commented 1 year ago

I've been working through this -- I'll update if I have questions or get stuck

polynomialherder commented 1 year ago

@markshannon My plan is to add a member to the _PyCfgInstruction and _PyCompile_Instruction struct types instruction called i_fc or i_fromclosure that indicates whether the instruction originated in a closure, and then add a macro LOAD_FAST_CLOSURE (or similar) for adding the LOAD_FAST instructions with that member set to 1. Then I'll replace the existing LOAD_CLOSURE calls with those appropriately marked LOAD_FAST calls. Then downstream functions like fix_cell_offsets can check that member before processing. Let me know if there are any obvious problems with this design, or if you have thoughts on better approaches

For this unit of work, do we want to preserve the current performance characteristics to reduce risk? For instance, I see that replacing LOAD_CLOSURE with LOAD_FAST would immediately impact some of the optimization functions in flowgraph.c like insert_superinstructions and fast_scan_many_locals that presently don't operate on LOAD_CLOSURE instructions. I'm not sure what the consequences would be of allowing those functions to operate on LOAD_FAST instructions that would have previously been LOAD_CLOSURE, perhaps it would "just work" -- or whether we should add a check of that i_fc member to preserve the current behavior

Let me know your thoughts on this -- I'm pretty new to the cpython source code so still piecing together the local variable lifecycle. Thanks!

markshannon commented 1 year ago

Rather than add special cases to _PyCfgInstruction and _PyCompile_Instruction would it make more sense use the names of the variables as operands, rather than indices into two different lists? The numbering can then be done in the assembler.

There is no reason to treat LOAD_CLOSURE differently to LOAD_FAST in the optimizer.

carljm commented 1 year ago

If you don't want to tackle a full refactoring of how local variable and cell offsets are handled in this PR, a simpler incremental approach would be to make LOAD_CLOSURE a pseudo-instruction (similar to e.g. STORE_FAST_MAYBE_NULL) which is emitted in codegen (and thus can still be recognized in fix_cell_offsets) but is converted to LOAD_FAST in _PyCfg_ConvertPseudoOps. This would eliminate LOAD_CLOSURE as a real instruction that has to be handled at runtime.

The refactor of cell index handling could still be done as a follow-up PR, allowing removal of the LOAD_CLOSURE pseudo-op.

gvanrossum commented 1 year ago

(From your post to core-mentorship:)

When I first set up my cpython development environment, I followed the guidance on the developer’s guide and have been studying the compiler design page closely. From my reading of it, when I remove an opcode, I need to

  • Remove references to the opcode from the code and update any logic impacted
  • Remove the opcode from Lib/opcode.py and the dis module documentation
  • Run make regen-opcode regen-opcode-targets to regenerate Include/opcode.h and Python/opcode_targets.h
  • Update the magic number in Lib/importlib/_bootstrap_external.py and PC/launcher.c
  • "Add the new bytecode target"
  • Run make regen-importlib

The first step is probably most of the work, but I'm a little bit unclear about the second to last step.

The compiler design page in the developer guide has written

Changes to Lib/importlib/_bootstrap_external.py will take effect only after running make regen-importlib. Running this command before adding the new bytecode target to Python/ceval.c will result in an error. You should only run make regen-importlib after the new bytecode target has been added.

It makes sense to me that if we’re introducing a new opcode, we should ideally have logic to handle the new instructions in ceval.c, but it’s not clear to me what error would be triggered if we’re not referencing the new instruction, so I feel I’m not understanding something. I also don't see most opcodes referenced in ceval.c -- for instance, no LOAD_FAST, STOREFAST, or JUMP* instructions, at least not explicitly.

The instructions may well be out of date. (A PR to update them would be welcome of course!)

As of 3.12, the cases in the switch are generated by tooling in Tool/cases_generator (check out the README.md there). This reads Python/bytecodes.c and writes Python/generated_cases.c.h, which in #included by ceval.c in the switch body. IMPORTANT: Whenever you edit bytecodes.c, run make regen-cases to regenerate the output files. likely that Carl's suggestion is right: make LOAD_CLOSURE a pseudo op (search for "pseudo" in bytecodes.c) so the compiler can distinguish between it and LOAD_FAST if it needs to, make the assembler translate it to LOAD_FAST. I think that it's totally fine if a sequence of e.g. LOAD_FAST, LOAD_CLOSURE ends up being replaced by a single LOAD_FAST_LOAD_FAST instruction (I think that's what Mark meant).

If you are still stuck, can you push your work to a branch in your GitHub fork of the cpython repo and link it here, and post the errors you are getting (and from which command)?

polynomialherder commented 1 year ago

Thank you all for the tips, this helped a lot. I'll proceed with the approach you all suggested, will report back soon with progress

polynomialherder commented 1 year ago

As far as automated tests go, would adding a test for _PyCfg_ConvertPseudoOps be worthwhile? Or is there a better way to validate this change?

gvanrossum commented 1 year ago

Usually when messing around with the interpreter, once you get it to build it's pretty solid, and if you get it to pass the test suite, it's golden. It seems you've already reached that stage. :-) Platinum would be to pass the buildbots, esp. the refleak subset. IIUC only triagers and core devs can trigger a buildbot run (by setting a magic label on the PR), and will do so as part of the PR review. I can do that for you.

Whether a test for _PyCfg_ConvertPseudoOps is needed, maybe @iritkatriel has an opinion.

iritkatriel commented 1 year ago

_PyCfg_ConvertPseudoOps can be tested as part of the assembler. See Lib/test/test_compiler_assemble.py, which calls _PyCompile_Assemble. This is pretty new, so there aren't many tests there yet.

polynomialherder commented 1 year ago

Thanks @iritkatriel -- just to make sure before I proceed, are you suggesting that I expose _PyCfg_ConvertPseudoOps in the same way that _PyCompile_Assemble has been exposed in _testinternalcapi.c so that it can be called from Lib/test_compiler_assemble.py (perhaps via helpers in test.support.bytecode_helper)?

iritkatriel commented 1 year ago

I wouldn't do that. I'd just add a test to Lib/test/test_compiler_assemble.py that creates a code object from an instruction sequence that contains the pseudo ops, and then checks that the code works as expected.

If the pseudo ops are not replaced the code object can't possibly work.

polynomialherder commented 1 year ago

As I wrote in the Core-mentorship thread, I'll start to think now about how to refactor the cell/local variable tracking so that we can remove LOAD_CLOSURE altogether, unless anyone else is actively working on that piece right now. If anyone has any thoughts on how to proceed OTTOYH, let me know -- otherwise I'll ping back with questions

I'll also keep an eye on the PR which converts LOAD_CLOSURE to a pseudo-op and continue to make changes there as needed

polynomialherder commented 1 year ago

I'm still working through this -- I spent the last week writing simple examples and stepping through their evaluation in gdb to understand how the localsplus array is constructed by the compiler and ultimately referenced during evaluation

Something that confused me previously is that co_cellvars refers to cell variables contributed by the object under consideration to the scopes of child objects -- unlike co_freevars which refers to variables contributed to the object under consideration by its parent scopes (but excluding global variables).

In terms of possible approaches, @carljm commented that we could either refactor the cell/local offsets logic so that we can remove LOAD_CLOSURE altogether, or consider adding explicit LOAD_CLOSURE support to some optimizations -- I was planning to proceed with the former approach, but interested in whether @carljm has more to say on the latter or if anyone else has any thoughts (@markshannon? @iritkatriel?)

carljm commented 1 year ago

In terms of possible approaches, @carljm https://github.com/python/cpython/pull/106059#discussion_r1242397712 that we could either refactor the cell/local offsets logic so that we can remove LOAD_CLOSURE altogether, or consider adding explicit LOAD_CLOSURE support to some optimizations

I would say that the former is the ideal forward-looking approach, but may be high difficulty/complexity; it should be possible, but I haven't looked at it in detail. The latter is more of a short-term approach if for performance reasons we urgently want LOAD_CLOSURE to participate better in some specific bytecode optimizations, but we haven't been able to achieve the former yet.

markshannon commented 1 year ago

I don't think this is urgent. If it is too awkward to implement now, it can wait.

We plan (vaguely) to change the compiler to use names, rather than indices for load instructions. Once that change has been made, removing LOAD_CLOSURE should be almost trivial.