python / devguide

The Python developer's guide
https://devguide.python.org/
Creative Commons Zero v1.0 Universal
1.88k stars 786 forks source link

Bytecode interpreter section: provide a full specification of each opcode #1078

Open Christopher-Chianelli opened 1 year ago

Christopher-Chianelli commented 1 year ago

Describe the enhancement or feature you'd like The documentation for the dis module provide a summary of what each opcode does. However, the summary is not enough to fully understand what each opcode actually does. For instance, the documentation for SEND:

(https://docs.python.org/3/library/dis.html#opcode-SEND)

Sends None to the sub-generator of this generator. Used in yield from and await statements.

I propose a full spec be given in a format that looks like this:

Opcode Name

Stack Prior: ... [expected stack state]
Stack After: ... [new stack state]

Description of Opcode

Example sources that generate the opcode

For the SEND opcode, it would look like this:

SEND(target_delta)

Stack Prior:                            ... subgenerator, sent_value
Stack if subgenerator is not exhausted: ... subgenerator, yielded_value
Stack if subgenerator is exhausted:     ... subgenerator

Pops off the top of stack, and sends it to the sub-generator of this generator. If the sub-generator is
not exhausted, the yielded value is pushed to the top of the stack. Otherwise, jump forward by
target_delta, leaving subgenerator on the stack. Used to implement yield from and await statements.

Example Sources:
# yield from subgenerator is implemented as the following loop
# (with None initially at the top of the stack)
#
# SEND (sends the top of stack to the subgenerator)
# YIELD_VALUE (returns the yielded value to the caller)
# RESUME
# JUMP_BACKWARD_NO_INTERRUPT (to SEND)
# POP_TOP (target of SEND)
#
# Before the loop, GET_YIELD_FROM_ITER is used to get the generator
# that will act as the subgenerator
yield from subgenerator

This is similar to how the Java virtual machine documents its opcodes (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html), with an additional section describing sources where the opcode are emitted.

Describe alternatives you've considered

Additional context For the majority of CPython 3.11 bytecodes, I have already written documentation for them using the above format (in Asciidoc): https://github.com/Christopher-Chianelli/optapy/blob/jpyinterpreter-docs/jpyinterpreter-docs/src/modules/ROOT/pages/opcodes/opcodes.adoc . I can convert the documentation to reStructuredText and create a PR to this repo if this issue is accepted.

encukou commented 1 year ago

IMO, this is changing way too fast to be documented here. The devguide is too version-independent.

AFAIK the stack effect info is nowadays in bytecodes.c, as (inputs -- outputs), as documented with the code.

CAM-Gerlach commented 1 year ago

Might this belong instead in the bytecode instructions section of the dis section in the main CPython docs? That's where the opcodes are currently documented (e.g. SEND) and is version-dependent.

Should this be moved to the CPython repo? Closed? Or what are the next steps here?

Christopher-Chianelli commented 1 year ago

I considered adding it to the documentation section of dis, but decided to create the issue here since the documentation can quickly become out of date if a bytecode developer forgets to update the documentation. In my opinion, the only thing worse than not having a specification, is having an incorrect specification. If the documentation is here, then the problem of it being out of date is not as impactful since it would only affect CPython bytecode developers (versus, all dis users), who are hopefully familiar enough with bytecode changes to recognize when something goes out of date. I don't have a problem moving this issue to the CPython repo so it can be discussed if dis doucmentation should have a full specification.

encukou commented 1 year ago

Parts of the documentation can be auto-generated. I don't know if the source format is stable enough to maintain another consumer for it, though.