google / starlark-go

Starlark in Go: the Starlark configuration language, implemented in Go
BSD 3-Clause "New" or "Revised" License
2.26k stars 204 forks source link

Why are the {C,ITER,}JMP instructions encoded with a 4-byte padding? #522

Closed mna closed 6 months ago

mna commented 6 months ago

Hello,

I'm trying to get a good grasp of the bytecode and VM and I was wondering why the JMP opcodes (including CJMP and ITERJMP) have their argument (the jump destination address) padded to 4 bytes in the []byte bytecode slice, but not the other opcodes that take an argument (e.g. CONSTANT)?

My initial hunch was that this was for Opcodes to always be "aligned" on modulo-4 indices, but that's clearly not the case since there are argument-less opcodes that encode to a single byte and opcodes with non-padded arguments, but also it's only the argument part that takes 4 bytes (so the complete jump instruction takes 5 bytes with the Opcode itself).

AFAICT there's no comment that explains this, and I checked the git commit when that was introduced but couldn't get a good understanding there either. It's very possible that it's something obvious that eludes me :)

Thank you and a happy new year! Martin

mna commented 6 months ago

After seeing https://github.com/google/starlark-go/issues/418, I believe that's the reason - the jump addresses are always 4 bytes so that they could be patched should a debugger or some other program insert breakpoints in the instructions. The other opcodes that take an argument don't need that because they point to static tables (names, globals, functions, etc.), not addresses that may vary if the bytecode is being manipulated.

Leaving open in case there's more to it, but feel free to close otherwise!

Thanks, Martin

adonovan commented 6 months ago

After seeing https://github.com/google/starlark-go/issues/418, I believe that's the reason...

In fact, the reason is not debugging: though I never had time to finish a prototype of the debugger, breakpoints can be implemented by defining a single-byte BREAKPOINT instruction and then replacing the first byte of any other instruction (not just JMP) with it.

The true reason is simplicity of implementation. If all instructions have a known size, then the address of each basic block can be computed easily before emitting the instruction stream. But if there are two kinds of jump instruction (wide and narrow target address), then you need a more complex algorithm. You can use either an optimistic or a pessimistic approach. The optimistic approach assumes that all jumps can be narrow, but may have to fix up jumps to a block whose address is too wide; this fixing-up may in turn may cause other block addresses to become wide, requiring further fix-ups, and so on. The pessimistic approach starts by assuming all jumps are wide, and optimizes ones with small addresses, which makes blocks more compact, enabling further optimization. Either way, you need a fixed-point iteration to select the most compact jump instruction for all cases.

If data supports this optimization, I would accept a PR to implement it, but it seems unlikely to be important. I think there are bigger gains to be made by changing the instruction set more generally so that there is no loop required to decode the argument of every instruction.