RFC: x64 Implementation Proposal

pstephens commented 7 years ago

This is a slightly longer version of the proposal posted on Twitter: https://twitter.com/peterastephens/status/806710260815822849

Originally I was thinking the x64 implementation would need to emit binary opcodes, ELF headers, etc. After taking a look at @dubek's malc LLVM implementation I think a simpler approach can be taken: the compiler can be written in mal and bootstrap itself by emitting x64 assembler source code which can then be assembled using nasm and linked using ld. See this gist for a quick POC I did.

There are some challenges:

I'll need to implement a JIT compiler in order to execute (eval ...). Nasm could work here but binary address space manipulation would still be required to patch the running process. A more likely plan would be to create a higher level mal based AST which could be transformed to either assembly source or could be assembled in process using built-in binary support. The bootstrapping process would then be to build the compiler using the existing mal language as well as an existing mal implementation using AOT compilation. The produced binary would then implement a super set of mal capable of manipulating binary data and patching its running process.
GC - A garbage collector would be a nice to have but not an initial requirement. A fixed size heap could be used (1GB for instance) and then the process could be terminated when OOM was reached. This would likely pass most of the mal test suite but would likely fail with heavy recursion. Clearly a rudimentary GC would be something to look at down the road and even a stop-the-world single generational GC would be useful and educational.
Data structures - the heap could be tuned for the native mal data structures plus a few extras to support JIT compilation. Nice to have would be a native implementation of the persistent data structures as described by Chris Okasaki.
Exception handling - this would require keeping detailed metadata on how each function's opcodes were laid out and then to perform stack analysis when an exception was thrown. This will be an area of research.
Interop - not quite sure what form this would take. Maybe the ability to make unsafe syscalls? With GC this would require pinning, native pointers, and complexity.

This mal implementation could be written using the 10 steps and could pass the test suite when fully implemented.

Question: should this be built as part of the mal repo proper? Or as a separate repo as with @dubek's malc?

Comments welcome.

dubek commented 7 years ago

That's one big project... Good luck!

This mal implementation could be written using the 10 steps and could pass the test suite when fully implemented.

If I understand correctly you suggest building a mal compiler, that is - takes mal source and outputs ELF executable binaries. How does this work as, for example, step0, which should run an endless loop echoing the user's input? Of course, if you have a full functioning compiler, you can compile the mal implementation of mal (mal/step0_repl.mal), run that, and pass all the tests. malc can do that (luckily the mal-in-mal doesn't use eval which malc doesn't yet support).

@kanaka and I discussed in the past how to add malc to the test suite, perhaps as yet another variant for the mal-in-mal code (see the common-lisp dir for an example of how to run/compile the same source code with different compilers). In that sense malc is in a separate repo (much like gcc and rustc are in separate repos).

dubek commented 7 years ago

BTW: @pstephens , you're welcome to join #mal on freenode if you're not already there.

kanaka commented 7 years ago

@pstephens Emitting raw machine/object code would definitely be an interesting challenge/puzzle, but it would be a lot of effort to learn not that much more than emitting assembly. I think the most interesting parts (in terms of both learning and results) are definitely the items that you enumerated (JIT/eval, GC, efficient persistent data structures, exceptions, interop). I think this is basically the conclusion you arrived at.

Also, unless you are really partial to x86 assembly (I certainly have nostalgia for it), I would suggest going the LLVM IR route. It's easier to emit (unless you're already an x86 assembly expert), more flexible (output to many architectures), and a more useful thing to learn (at least LLVM related stuff is hotter on a resume these days). But perhaps the biggest advantage is that you could leverage the well developed JIT infrastructure that already exists in LLVM.

Regarding where to do the project. I think it makes sense for the compiler itself to live in a separate repo. Once that exists, using that as an alternate build mode for the mal-in-mal in the mal project seems reasonable. I.e. we could have both @dubek's and your implementation as alternate compilers for building a compiled mal-in-mal implementation.

Also, definitely join #mal :-)

dubek commented 7 years ago

Another advantage for generating LLVM IR is that you can gain from optimizations at the IR level (LLVM's opt tool). For example, in malc a Mal expression like (* 3 (+ 2 5)) will be compiled something like mal_integer_to_raw(mal_mul(make_integer(3), mal_add(make_integer(2), make_integer(5)))) and opt will optimize all that to the literal 21 in the generated optimized LLVM code.

pstephens commented 7 years ago

@dubek wrote:

How does this work as, for example, step0, which should run an endless loop echoing the user's input?

I think the main distinction is AOT vs JIT. In order to support the REPL while also generating x64 a JIT will need to be implemented. Generating asm source will also be required to bootstrap from an existing mal implementation. Technically the whole thing could be written in plain assembly, but I kind of like the idea of using mal macros and fns for composition rather than the macro system of the assembler. AOT is a "nice to have" to improve startup timing but not required to pass the test suite.

This whole thing could also be written as an interpreter but would only end up being a lower level version of the C implementation. (Still interesting though!)

@kanaka wrote:

Emitting raw machine/object code would definitely be an interesting challenge/puzzle, but it would be a lot of effort to learn not that much more than emitting assembly.

Yeah, I've already decide to emit NASM assembly source rather than an ELF binary. This will make it possible to bootstrap from any existing mal implementation, similarly to how @dubek bootstraps the malc implementation. The JIT compiler could also shell out to NASM, but it shouldn't be terribly hard to write an in process assembler once the binary is bootstrapped. I'll only need to implement a subset of x64 anyway (no SSE, MMX, FP, etc.)

@kanaka wrote:

I would suggest going the LLVM IR route.

LLVM is also interesting and useful. And definitely better for the resume. But I do want to go pretty low level on how JIT/GC/Syscalls/Unwinding the Stack/etc. work. For example, I've already read one paper on GC, gotten 1/3 of the way through one of the AMD 64 reference volumes, and gotten part way through the persistent data structures book. So the learning strategy seems to be working.

For the "advanced mal course" it may be interesting to take a similar attack for other "virtual machines". I.e. target CIL, Java Byte Code, or LLVM IR. I use the term "virtual machine" loosely here because it may also make sense to cross compile to an existing high level language like JavaScript or Erlang.

@kanaka wrote:

I think it makes sense for the compiler itself to live in a separate repo.

Are you sure? I think this could follow the 11 step progression. It would have more "extra" files to cover the low level nature of the implementation. See https://github.com/pstephens/mal/tree/mal-x64/mal-x64 for my first take on this.

Or... this could be done separately as you suggest. But the 11 step progression would be less distinct. Maybe I'll keep implementing on the fork and we can defer this decision until later.

kanaka commented 7 years ago

On Sun, Dec 18, 2016 at 3:30 PM, Peter Stephens notifications@github.com wrote:

Yeah, I've already decide to emit NASM assembly source rather than an ELF binary. This will make it possible to bootstrap from any existing mal implementation, similarly to how @dubek https://github.com/dubek bootstraps the malc implementation. The JIT compiler could also shell out to NASM, but it shouldn't be terribly hard to write an assembler in assembler once the binary it bootstrapped. I'll only need to a subset of x64 anyway (no SSE, MMX, FP, etc.)

Sounds pretty hard to me. :-) But it would definitely be cool.

LLVM is also interesting and useful. And definitely better for the resume. But I do want to go pretty low level on how JIT/GC/Syscalls/Unwinding the Stack/etc. work. For example, I've already read one paper on GC, gotten 1/3 of the way through one of the AMD 64 reference volumes, and gotten part way through the persistent data structures book. So the strategy seems to be working.

Yep, if that's what "scratches the itch", then go for it!

Are you sure? I think this could follow the 11 step progression. It will have more "extra" files to cover the low level nature of the implementation. See https://github.com/pstephens/mal/tree/mal-x64/mal-x64 for my first take on this.

Or... this could be done separately as you suggest. But the 11 step progression would be less distinct. Maybe I keep implementing on the fork and we can defer this decision until later.

No problem at all as a branch. We can look at it later to see if it makes sense to merge it to master. If it follows the steps and the general code structure and can pass all the mandatory steps then I think it probably does make sense to merge it to the main tree.

I'm exited to see how it works out.

kanaka commented 5 years ago

@pstephens I'm going to close this since it has been open for a while and in the meantime Ben Dudson has done a full x86 assembly/nasm implementation. If you do ever get back around to this alternate approach, I would be happy to consider including it too. I'm happy to include multiple implementations for the same language if they have very different approaches that still fit the overall mal structure and pedagogy goal.

kanaka / mal

RFC: x64 Implementation Proposal #251