The JAPL compiler 2.0 & Notes on bytecode serialization

nocturn9x commented 3 years ago

JAPL is evolving pretty quickly, and as more and more features get added the need for a new and improved compiler toolsuite arises. The current implementation uses a single-pass bytecode compiler with a parser using pratt's technique. This is elegant for small toy languages, but scales poorly when it comes to optimizations or more complex syntactical structures.

The proposal is to redesign the whole backend of the language using a recursive-descent top-down approach (like the Python PoC available here) and to separate the parser from the compiler to improve modularity (right now the compiler and the parser are very tightly integrated). Another substantial change would be the use of an Abstract Syntax Tree (AST), making the compiler a double-pass compiler (maybe triple-pass, if we reserve a whole compilation step to optimizing code). Using an AST makes it a lot easier to navigate code in a structured fashion and simplifies the implementation of crucial features such as closures (since we can simply go back n nodes in the tree to know which locals to capture, for instance).

Another important step is bytecode serialization. The current implementation of some features, such as the call stack for functions, was implemented with simplicity of use in mind and is therefore trickier to serialize properly. Specifically, the VM uses a CallFrame object to store all the function's frame information: this object holds the function's stack, which is a copy of a subset of the VM's own stack. This is simple to understand, but doesn't scale well and wastes a lot of resources. We propose to change this behavior to a calling convention inspired by the C programming language. This implies pushing 3 values on the stack:

The instruction pointer to which return will have to jump back
The index that marks the beginning of the function's stack
The current length of the entire stack (to pop the locals later when the VM returns to the callee)

Speaking of serialization, the VM should no longer depend on the existence of a compiler module and should NEVER interact with it directly. Since the new compiler will use nim's own garbage collector, we don't want that to kick in in our runtime too like it happens now (which slows things down and causes random segfaults because the GC gets confused and frees stuff that we actually need). The compiler will serialize the bytecode to a file, and the VM will deserialize it from that file so there is no risk of GC'ed object making their way into our precious runtime

ghost commented 3 years ago

Note: writing the bytecode to a file may sound slow, but it would be done anyways because of caching

nocturn9x commented 3 years ago

@Productive2 Also consider that compilation would be done once. Even if it took a few minutes (which is unrealistic for a bytecode compiler anyway) that's nothing compared to the countless hours the code might be actually executing

japl-lang / japl

The JAPL compiler 2.0 & Notes on bytecode serialization #45