Project Proposal: Lox Programming Language Compiler and Bytecode Virtual Machine

Arthur-Chang016 commented 11 months ago

What will you do? Inspired by Crafting Interpreters I will be building a full-functioned Lox programming language compiler and a bytecode interpreter/ virtual machine in Rust language from scratch. Lox is a language that combines the syntax of Python and C with dynamic typing. And it has some extensions of object-oriented features and first-class functions.

As a new Rust user, one of the main difficulties of this project would be to correctly program Rust in the canonical way and pay best efforts to pass Rust's borrow checker.

How will you do it?

Lox Compiler This part will be built as fast and as easy as possible. Instead of using Bison- or Flex-like tools, the scanner and parser will be implemented from scratch. Pratt Parser Algorithm will be adopted in the front-end. Moreover, I won’t build the AST (abstract syntax tree). Instead, the bytecode will be generated immediately during parsing.
Bytecode Interpreter The language (ISA) for the interpreter is a stack machine that does data movements by mainly push and pop operations. This is not adopted in modern, high-performance architectures, but it can be very friendly for the front-end compiler development with quite acceptable performance. Furthermore, a Mark-Sweep Garbage Collection will be implemented as the memory management mechanism.

How will you empirically measure success? There are some official test cases. The most important metric of language implementation is to make it correct. I will write a test harness that uses these tests as input. I expect it will pass the tests that cover features of this language I implemented.

If time permits, I could find the hotspot of bytecode simulation and try to optimize it by either algorithm improvement or hacking Rust by using unsafe block.

I will do the performance comparison with both jlox and clox from official implementation. The former is AST tree-walking interpreter implemented by Java, and the latter is bytecode interpreter implemented by C. It could be especially interesting to compare the performance between C's and Rust's implementation (mine).

Team members:

Possible Future Works Build a compiler that compiles bytecode into native machine language (Arm64 for my computer) that can either be stack-machine-like style that doesn't need to take care of the register or register machine style that we should perform register allocation. The later one would be a huge project.

The above compiler can be used for the JIT compilation and we can also do the performance measurement on bytecode interpretation, stack style asm, or register style asm.

Or, build a profiler to measure performance between different backend implementation.

sampsyo commented 11 months ago

Sounds good overall!

Can you please expand on what you have written under "empirically measure success"? You have mentioned some test cases, which is great! What exactly will you do, and what outcomes do you expect? Please make a list. For example, it could include "I will write a test harness that uses these test files as input, and I expect 100% of the tests in this directory to pass." Or you could pick a subset of the tests.

And even if you don't implement any optimizations, I think you should measure performance and compare it against some sort of baseline (which baseline? you pick).

Smaller comments:

Pratt Parser Algorithm will be adopted in the front-end that is more likely to be used in industry.

Do you have a specific justification for this (or perhaps a specific baseline that Pratt is "more adoptable" than)? If I were to guess, I would have supposed that a simple recursive descent parser would be the most common in industry. :)

I won’t build the AST (abstract syntax tree). Instead, the bytecode will be generated immediately after parsing.

Do you mean "during parsing" (i.e., the parser will directly output bytecode)? Or is there some other intermediate data structure (which might be referred to as the "parse tree")?

I could find the hotspot of bytecode simulation and try to optimize it by either algorithm improvement or hacking Rust by using unsafe block.

I strongly suspect that there will be plenty of low-hanging fruit for optimization that does not require unsafe.

Arthur-Chang016 commented 11 months ago

Thanks for your detailed response. I added the performance test in the How will you empirically measure success?.

Do you have a specific justification for this

Sorry I don't have justification for this. I just read from this book that talks about it's more likely to implement front-end with Pratt Parser than generative tools like Bison or Flex.

sampsyo commented 11 months ago

Sounds good; thank you!

sampsyo / cs6120

Project Proposal: Lox Programming Language Compiler and Bytecode Virtual Machine #409