Mutable global variables as "registers"?

Traditionally, OCaml has a single stack (https://github.com/ocaml-multicore/ocaml-multicore/wiki/Native-code-notes#stack-layout-at-caml_start_program-vanilla-amd64) that holds both information about control flow and the values of local variables. Traditionally, there are only caller-save registers. Generally, mapping registers of the existing compilation toolchain to local or global variables in WASM intuitively intuitively seemed appropriate to me. Currently, it is not.

If we bring OCaml's GC to WASM, we need to find (and modify, since we have a moving GC) the roots of the heap. Thus, we must store any value that could be a root in a place where we the GC can find it. Consequence: we cannot use local variables to hold heap pointers, since there is no way to iterate over local variables of the other frames on the stack. We can store heap pointers in the linear memory. Currently, if we would store heap pointers in global variables, any time we call out of a module, we would need to dump these global variables to the linear memory, and restore them at the callee site, so that they can be shared across WASM modules (e.g., to the GC module). Dumping and restoring a set of variables at every call and return site obviously incurs a high cost - so using WASM variables like registers doesn't currently seem to be a reasonable strategy.

The proposal on importing/exporting mutable global variables seems related: https://github.com/WebAssembly/mutable-global/blob/89e5be9d69f2afac7243b6d2ff36b9c8723efb77/proposals/mutable-global/Overview.md Even sharing a stack pointer (to the WASM linear memory) across modules seems to be a pain point, at the moment.

If we had this, it would be quite simple to "use global variables like registers" and, thus, reuse many of the existing compilation strategies of the optimizing OCaml compiler. Currently, it looks to me like there is a performance benefit to using global/local variables over the linear memory in Chrome, but not in Firefox. Generally, in the long run, I would expect a performance advantage for using variables instead of linear memory.

There's this other strategy for fulfilling the purpose that registers serve: we could choose a bunch of addresses on the WASM linear memory and use that in a similar fashion to registers.

For reference, in the LLVM backend (https://github.com/llvm/llvm-project/blob/6088f84398847152ad97eb1bc0b139a28e879b48/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp), registers are "stackified", where possible. Any non-stackifiable registers are compiled to WASM local variables. We could use the same strategy, but only if we drop the idea of bringing the OCaml GC to WASM. I.e. we would need a WASM GC.

@lukewagner please comment on whether this makes sense and correct where I am mistaken.

This is a good question to ask. There is a very high-level question of whether to use linear memory (and a GC implemented in wasm on top of linear memory) vs. the future GC proposal, but I'll assume we're going with the linear memory proposal, at least in the short-term.

I think if you're going with a linear memory GC, then, however registers are represented, you'll need a linked list in memory representing the GC references in caller frames. Given this, it might be ultimately most efficient to use a full shadow stack technique for storing all GC roots in linear memory, which allows the compiler to control exactly when GC roots are stored to memory. While mutable global variables may be the most optimizable in isolation, if their values must ultimately be written into linear memory anyway, then I think the wasm producer ultimately has the better opportunity to optimize if they can control all native loads and stores via linear memory.

Happy to discuss more! Thanks for the ping.

I had a look at the GC experiments in https://github.com/lars-t-hansen/moz-gc-experiments/blob/master/version2.md and the tests at https://hg.mozilla.org/mozilla-central/file/tip/js/src/jit-test/tests/wasm/gc/ which both look very interesting and promising. I could see us trying to directly compile to the prototype GC implementation to see how that works out. I'm not sure if typed structs accessed with fields are the right model. Doesn't that mean that, for every length of heap struct, the WASM producer would need to emit a type definition? What about strings on the heap?

For a model of the heap, I would expect being able to allocate tiny "linear memories" (heap structs), i.e. allocation being a function that I tell how many bytes/words/whatevers to allocate and that, if possible, returns a reference to a heap struct with the requested size. I would guess that the only types read and written to these heap structs are i32, i64, u32, u64, as well as some reference type that represents only heap references. In OCaml, one traditionally distinguishes heap references and primitive values this way: primitive values are shifted left and the rightmost bit is set to 1 (i.e. we only have 31-bit primitive values. In turn, we are able to identify primitive values during a GC run). Heap references are always aligned, and, thus, their rightmost bit is always 0.

One question that is looming over all this is... how do reference types (that can be passed freely across the language boundary) and heap-allocated struct types affect the overall system?

I mean, when we store anyrefs in the linear WASM memory, the engine needs to keep track of that, right? Clearly, we don't want to scan the full WASM linear memory when a GC run is triggered, so this task of maintaining a list of GC references would be offloaded to the WASM/JS engine.

What about struct field types (if the WASM GC adopts these)? It sounds like bookkeeping would need to be done at runtime in order to be able to error when a field is accessed with the wrong type? What about the runtime penalty and memory consumption overhead associated with this?

It seems to me like there is a long tail of consequences that could result in things growing to be really complex and a really tough problem to optimize for all these different languages that will eventually compile to WASM. If these issues are solved or solvable in a satisfying way, though, the GC proposal looks really interesting.

In the other case of bringing the existing GC to WASM linear memory: The WASM producer could separate the stack into a control stack (holds instructions) and a value stack (holds heap references). The control stack is represented by the WASM stack. The value stack is a region in WASM linear memory (people seem to like to call this "shadow stack" even though it doesn't necessarily shadow anything). Heap references are stackifiable to a limited extent: it must be proven that, every time a call to the GC can happen, there exist no stackified heap references on the WASM stack (or else the GC will not be able to change the heap reference if the memory region it points to is moved elsewhere). The good thing is that, thanks to the caller-save strategy, GC references must be written to linear memory anyways, if they are still needed. Since the GC is being called from alloc, and not running in a different thread, this is all fine.

It seems to me that it is also possible to split up the value stack further: in a heap reference stack (must be in linear WASM mem) and a primitive value stack (which could be compiled to WASM local variables if there is a performance advantage).

Please feel free to correct and/or point me at things I should look at in order to understand more. :)

Great questions. I think the key design choice that avoids a lot of the complexities you've mentioned is that reference types (see the reference types proposal, which is implemented in multiple engines and nearing shipping, for background) are always stored in statically-typed locations that are known to the GC (importantly, not including linear memory). With the initial reference-types proposal, this means only locals, globals, and tables, so if you want to store an anyref in linear memory, you must allocate a typed element in a table and store an index in linear memory (being careful to deallocate the table element when it is no longer needed). With the GC proposal a new kind of typed heap memory is added (instances of struct/array types) which can store references without the indirection through tables.

One thing I remembered: if you want to experiment with the GC proposal without blocking on implementation in existing browser engines, there was an interesting intern project last summer, cmm_of_wasm, which I think had experimentation with wasm GC as an explicit goal.

Ah, I missed that anyrefs from the reference types proposal can't be stored in the WASM linear memory directly (I don't know why I assumed that in the first place). That does make implementation simpler, indeed, by placing the burden on the WASM emitter. :slightly_smiling_face:

It currently looks like prototyping on cmm_of_wasm is indeed one of the better paths I can take - as this would allow giving some feedback on the specs before they get finalized and also to explore different compilation strategies.

sabine / ocaml-to-wasm-overview

Mutable global variables as "registers"? #1