Closed markshannon closed 1 year ago
I'd like to work on this issue
Are you sure? This is more complicated than it looks as it involves some awkward compiler internals.
Oh, I thought it would be just deleting some statements
It should be. But the compiler handles local variables in a somewhat convoluted way
Ok, then someone else can pick the issue up.
I've been working through this -- I'll update if I have questions or get stuck
@markshannon My plan is to add a member to the _PyCfgInstruction
and _PyCompile_Instruction
struct types instruction called i_fc
or i_fromclosure
that indicates whether the instruction originated in a closure, and then add a macro LOAD_FAST_CLOSURE
(or similar) for adding the LOAD_FAST
instructions with that member set to 1. Then I'll replace the existing LOAD_CLOSURE calls with those appropriately marked LOAD_FAST
calls. Then downstream functions like fix_cell_offsets
can check that member before processing. Let me know if there are any obvious problems with this design, or if you have thoughts on better approaches
For this unit of work, do we want to preserve the current performance characteristics to reduce risk? For instance, I see that replacing LOAD_CLOSURE
with LOAD_FAST
would immediately impact some of the optimization functions in flowgraph.c
like insert_superinstructions
and fast_scan_many_locals
that presently don't operate on LOAD_CLOSURE
instructions. I'm not sure what the consequences would be of allowing those functions to operate on LOAD_FAST
instructions that would have previously been LOAD_CLOSURE
, perhaps it would "just work" -- or whether we should add a check of that i_fc
member to preserve the current behavior
Let me know your thoughts on this -- I'm pretty new to the cpython source code so still piecing together the local variable lifecycle. Thanks!
Rather than add special cases to _PyCfgInstruction
and _PyCompile_Instruction
would it make more sense use the names of the variables as operands, rather than indices into two different lists?
The numbering can then be done in the assembler.
There is no reason to treat LOAD_CLOSURE
differently to LOAD_FAST
in the optimizer.
If you don't want to tackle a full refactoring of how local variable and cell offsets are handled in this PR, a simpler incremental approach would be to make LOAD_CLOSURE
a pseudo-instruction (similar to e.g. STORE_FAST_MAYBE_NULL
) which is emitted in codegen (and thus can still be recognized in fix_cell_offsets
) but is converted to LOAD_FAST
in _PyCfg_ConvertPseudoOps
. This would eliminate LOAD_CLOSURE
as a real instruction that has to be handled at runtime.
The refactor of cell index handling could still be done as a follow-up PR, allowing removal of the LOAD_CLOSURE
pseudo-op.
(From your post to core-mentorship:)
When I first set up my cpython development environment, I followed the guidance on the developer’s guide and have been studying the compiler design page closely. From my reading of it, when I remove an opcode, I need to
- Remove references to the opcode from the code and update any logic impacted
- Remove the opcode from Lib/opcode.py and the dis module documentation
- Run make regen-opcode regen-opcode-targets to regenerate Include/opcode.h and Python/opcode_targets.h
- Update the magic number in Lib/importlib/_bootstrap_external.py and PC/launcher.c
- "Add the new bytecode target"
- Run make regen-importlib
The first step is probably most of the work, but I'm a little bit unclear about the second to last step.
The compiler design page in the developer guide has written
Changes to Lib/importlib/_bootstrap_external.py will take effect only after running make regen-importlib. Running this command before adding the new bytecode target to Python/ceval.c will result in an error. You should only run make regen-importlib after the new bytecode target has been added.
It makes sense to me that if we’re introducing a new opcode, we should ideally have logic to handle the new instructions in
ceval.c
, but it’s not clear to me what error would be triggered if we’re not referencing the new instruction, so I feel I’m not understanding something. I also don't see most opcodes referenced in ceval.c -- for instance, no LOAD_FAST, STOREFAST, or JUMP* instructions, at least not explicitly.
The instructions may well be out of date. (A PR to update them would be welcome of course!)
As of 3.12, the cases in the switch are generated by tooling in Tool/cases_generator (check out the README.md there). This reads Python/bytecodes.c and writes Python/generated_cases.c.h, which in #include
d by ceval.c in the switch body. IMPORTANT: Whenever you edit bytecodes.c, run make regen-cases
to regenerate the output files.
likely that Carl's suggestion is right: make LOAD_CLOSURE a pseudo op (search for "pseudo" in bytecodes.c) so the compiler can distinguish between it and LOAD_FAST if it needs to, make the assembler translate it to LOAD_FAST. I think that it's totally fine if a sequence of e.g. LOAD_FAST, LOAD_CLOSURE ends up being replaced by a single LOAD_FAST_LOAD_FAST instruction (I think that's what Mark meant).
If you are still stuck, can you push your work to a branch in your GitHub fork of the cpython repo and link it here, and post the errors you are getting (and from which command)?
Thank you all for the tips, this helped a lot. I'll proceed with the approach you all suggested, will report back soon with progress
As far as automated tests go, would adding a test for _PyCfg_ConvertPseudoOps
be worthwhile? Or is there a better way to validate this change?
Usually when messing around with the interpreter, once you get it to build it's pretty solid, and if you get it to pass the test suite, it's golden. It seems you've already reached that stage. :-) Platinum would be to pass the buildbots, esp. the refleak subset. IIUC only triagers and core devs can trigger a buildbot run (by setting a magic label on the PR), and will do so as part of the PR review. I can do that for you.
Whether a test for _PyCfg_ConvertPseudoOps
is needed, maybe @iritkatriel has an opinion.
_PyCfg_ConvertPseudoOps
can be tested as part of the assembler. See Lib/test/test_compiler_assemble.py, which calls _PyCompile_Assemble
. This is pretty new, so there aren't many tests there yet.
Thanks @iritkatriel -- just to make sure before I proceed, are you suggesting that I expose _PyCfg_ConvertPseudoOps
in the same way that _PyCompile_Assemble
has been exposed in _testinternalcapi.c
so that it can be called from Lib/test_compiler_assemble.py
(perhaps via helpers in test.support.bytecode_helper
)?
I wouldn't do that. I'd just add a test to Lib/test/test_compiler_assemble.py that creates a code object from an instruction sequence that contains the pseudo ops, and then checks that the code works as expected.
If the pseudo ops are not replaced the code object can't possibly work.
As I wrote in the Core-mentorship
thread, I'll start to think now about how to refactor the cell/local variable tracking so that we can remove LOAD_CLOSURE
altogether, unless anyone else is actively working on that piece right now. If anyone has any thoughts on how to proceed OTTOYH, let me know -- otherwise I'll ping back with questions
I'll also keep an eye on the PR which converts LOAD_CLOSURE
to a pseudo-op and continue to make changes there as needed
I'm still working through this -- I spent the last week writing simple examples and stepping through their evaluation in gdb to understand how the localsplus
array is constructed by the compiler and ultimately referenced during evaluation
Something that confused me previously is that co_cellvars
refers to cell variables contributed by the object under consideration to the scopes of child objects -- unlike co_freevars
which refers to variables contributed to the object under consideration by its parent scopes (but excluding global variables).
In terms of possible approaches, @carljm commented that we could either refactor the cell/local offsets logic so that we can remove LOAD_CLOSURE
altogether, or consider adding explicit LOAD_CLOSURE
support to some optimizations -- I was planning to proceed with the former approach, but interested in whether @carljm has more to say on the latter or if anyone else has any thoughts (@markshannon? @iritkatriel?)
In terms of possible approaches, @carljm https://github.com/python/cpython/pull/106059#discussion_r1242397712 that we could either refactor the cell/local offsets logic so that we can remove LOAD_CLOSURE altogether, or consider adding explicit LOAD_CLOSURE support to some optimizations
I would say that the former is the ideal forward-looking approach, but may be high difficulty/complexity; it should be possible, but I haven't looked at it in detail. The latter is more of a short-term approach if for performance reasons we urgently want LOAD_CLOSURE
to participate better in some specific bytecode optimizations, but we haven't been able to achieve the former yet.
I don't think this is urgent. If it is too awkward to implement now, it can wait.
We plan (vaguely) to change the compiler to use names, rather than indices for load instructions.
Once that change has been made, removing LOAD_CLOSURE
should be almost trivial.
LOAD_CLOSURE
is identical toLOAD_FAST_CHECK
in every way except its name and number.The justification for its existence is that "We keep LOAD_CLOSURE so that the bytecode stays more readable.". Which is insufficient justification to keep it given that it:
Linked PRs