kyren / piccolo

An experimental stackless Lua VM implemented in pure Rust
Creative Commons Zero v1.0 Universal
1.68k stars 63 forks source link

Passing Lua bytecode to the compiler and getting an AST-Like return value #28

Closed bananasov closed 1 year ago

bananasov commented 1 year ago

How would I do this? I've skimmed through the compiler code but as its 23:30 of the time of writing this i can't fully look into the entire source to see how Lua gets compiled to bytecode gets compiled to instructions or if it's Lua straight to instructions and skips bytecode entirely.

I've looked through the compiler example and it seems to be exactly what I'm looking for but i don't know if the prototype is a tree-like value that i can walk through with a simple for loop and a match statement.

bananasov commented 1 year ago

Does piccolo have any way to input bytecode instead of lua source? As far as i can tell there are only methods of reading source and not bytecode, I can also not find any method that can output the bytecode to a file for example.

kyren commented 1 year ago

There's no way to input bytecode and get back an AST because that's not what the bytecode is in the first place, the bytecode is compiled instructions for a (simplified) abstract machine, it's basically the same as how PUC-Rio Lua works. What you're asking for would be very roughly equivalent to taking compiled machine code and getting Rust back out of it, though admittedly still a lot simpler than that. In any case, that's not what the bytecode is designed to do at all, and there are even many equivalent pieces of Lua code that compile to the same bytecode. I might also be misunderstanding the question so sorry if that's the case.

As far as bytecode serialization, I just haven't gotten around to it really? Where the bytecode lives currently is in an 'opcodes' field of FunctionProto, it's just a list of instructions exactly as you'd think. There's another type now called... CompiledPrototype or something equally uncreative (I'm on mobile, I will check in a second) that's the same thing as FunctionProto but intended to be shared between garbage collector instances, it's really not very different though so it's the same idea. This is the type that you would want to serialize as bytecode, I think.

PUC-Rio Lua actually has a very annoying limitation, being that you can't share in memory representations of compiled code between Lua states. This iirc mostly has to do with how "upvalues" work, I believe Lua, from the outside, only gives you a way to go from source / bytecode into a full "closure", the only way to share code between Lua states is to compile the code to bytecode then load it again into every state. The intention with the whole CompiledPrototype / FunctionProto business is to do a little better at that, because it's useful for stuff like game engines with several interpreter instances. Since most of the prototype stuff is shared, fully independent interpreter instances running the same code can be very lightweight, compiled code could even be shared across garbage collectors if FunctionProto shared data with CompiledPrototype rather than just copying it.

You said "instructions" in your question and I'm not sure what you mean by that, there's no other steps, the compiler takes Lua source code and lexes / parses it into an AST which is in the "parser" module, you can do that now if you want, it's just not terribly useful by itself. Then, the AST is compiled to VM bytecode in a CompiledPrototype (which gets turned into a FunctionProto, which gets turned into a runnable Closure), and that is run by the VM in a big loop, there's no separate like.. machine code AOT / JIT compilation step, it's just a boring ol interpreter.

Does that help? Also, please know I empathize very strongly with the "it is 23:30 and I am tired and what the heck is all this 😔" feeling.

kyren commented 1 year ago

Unrelated, these type names are terrible and I want to change them, so if I change them before you look at it again I'm sorry.

kyren commented 1 year ago

Also, just to explain some more background...

This crate is less of a polished API and more of an undocumented set of weird lego blocks that you can make a Lua interpreter out of, so it's understandable that it's confusing to navigate.

The reason for this is really because.. well, partially because of the WIP state of things, but also because using a high level API can be quite limiting, the limitations around sharing in memory representations of function prototypes is a good example. Another example is implicit sharing, the current direction for the crate is towards basically having almost no required global state, the Root main_thread and globals table is mostly convention, I might not even keep those at ALL and just leave the registry and string interner.

Using PUC-Rio Lua's C API does not allow you to easily control shared state like this, it's quite difficult actually to instantiate N different independent versions of a chunk / function / coroutine or whatever without having full blown separate lua_States. The reason for that is partially trying to limit API complexity but I think in large part it's because touching the PUC-Rio Lua garbage collector is incredibly unsafe, so the API has to shield the user from it entirely.

Since the garbage collector system here is completely safe, I don't have that problem, so I'd like to more or less expose everything and just let the user pick and choose what they want. You can do a lot of neat things with this... want 10,000 different coroutines all running the same code and preemptively scheduled? Easy! Want ensure that each of those runs completely independent from the rest of them with no possibility of interaction? No sweat. That's basically the kind of API I want to have, as unrestricted as I can manage, let the user do weird things if they want.

bananasov commented 1 year ago

That explains a lot, thank you very much, don't worry about all the stuff i tried explaining i am still very tired and had no idea what i was writing yesterday. Feel free to close this issue if needed.