sabine / ocaml-to-wasm-overview

102 stars 4 forks source link

Ocaml to WASM - an Overview

:toc: :toclevels: 5

This is a collection of links and ideas and notes on compiling OCaml to WASM.

General Notes

From http://caml.inria.fr/pub/ml-archives/caml-list/2009/03/3a77bfcca0f90b763d127d1581d6a2f1.en.html (Xavier Leroy, 2009): .... 3- A language implementation like OCaml breaks down in four big parts: 1- Front-end compiler 2- Back-end compiler and code emitter 3- Run-time system 4- OS interface

[...]

6- Here is a schematic of the Caml compiler. (Use a fixed-width font.)

         |
         | parsing and preprocessing
         v
      Parsetree (untyped AST)
         |
         | type inference and checking
         v
      Typedtree (type-annotated AST)
         |
         | pattern-matching compilation, elimination of modules, classes
         v
      Lambda
       /  \
      /    \ closure conversion, inlining, uncurrying,
     v      \  data representation strategy
  Bytecode   \
              \
             Cmm
              |
              | code generation
              v
           Assembly code

....

A more recent high-level view of the compilation pipeline (from https://ocamllabs.slack.com/archives/C0JCHGE78/p1568626615023800, Sep 16, 2019): .... Source code parsing v Parsetree
typing
v Typedtree desugar pattern matching, modules, objects, etc; erase types, make explicit memory layout in terms of blocks and values
v Lambda (higher order lambda calculus based IR) make closure construction and usage explicit perform inlining
v Clambda (like Lambda but with explicit closures, direct/indirect calls) make block/value manipulation explicit make allocation explicit
v Cmm (tree-structured, explicit memory manipulation, C calls, etc) perform instruction selection, sequentialization into basic blocks, assignment of pseudo-registers
v Mach (block structured IR) liveness, register allocation, dead code elimination are Mach -> Mach transformations
v Linear (linear sequence of abstract assembly instructions, explicit register assignments) this step is heavily backend-dependent, implemented in emit.mlp

v Textual assembly code ....

Runtime / Garbage Collection

Both OCaml bytecode and OCaml native code comes with a runtime that provides functions needed to run the compiled program.

The runtime provides::

TODO: find out what else the runtime does.

Dealing with OCaml values' lifetimes in WASM::

Paths to WASM

Direct::

Indirect::

Direct Roads to WASM

1a) Lambda -> WASM

While there are currently no projects that translate OCaml's lambda IR to WASM, there are these:

1b) Cmm -> WASM

Starting from an already optimized version of the program is likely to result in a comparatively fast execution speed.

Generally, it appears that Cmm is a good starting point when compiling to WASM without using the WASM GC extension, since the memory representation has already been flattened at the Cmm stage.

1c) OCaml bytecode -> WASM

I am not aware of any projects that attempt translating from OCaml bytecode to WASM. Please let me know if you are.

An advantage is that the bytecode interpreter hardly ever changes at all (it is said to still be quite similar to what is laid out in https://caml.inria.fr/pub/papers/xleroy-zinc.pdf[the original report on ZINC]).

There is no dependency on compiler internals, as we can work on the bytecode output of ocamlc.

In the past, translating bytecode has proven to be a successful and maintainable strategy for compiling OCaml to different languages:

1d) bytecode interpreter in WASM

Indirect Roads to WASM

If there was a compiler from OCaml to LLVM, it would immediately enable compilation to WASM.

2a) Cmm -> LLVM

2b) OCaml bytecode -> LLVM

2c) machine code -> WASM

For compiling machine code to WASM, there apparently do not currently exist any solutions.

One would need to apply some kind of algorithm that transforms the control flow from a program-counter-based representation to the labeled continuations that can be seen in WASM, just like Emscripten's "Relooper" algorithm does for LLVM.

If there is an architecture whose machine code can be translated to WASM in a reasonably efficient fashion, and it turns out that OCaml already compiles to this architecture, this could be interesting.

If successful, this could, in the long run, help getting many other languages to compile to WASM as well.

My Collection of Links to Sort Through