Code generator prototyping strategies

eliotmoss commented 10 years ago

As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.

Initial prototype thought

For our initial prototype we are thinking about using QEMU (locally pronounced KEE-mew), the Quick Emulator. QEMU is a whole machine emulator (but can also be configured to run as user-mode only) that supports emulating one hardware ISA on another one (and same to same, of course). Beng whole-machine it deals with devices, interrupts, memory mapping, etc., but the part of greatest interest to us is that it takes guest ISA code blocks and dynamically translates them to host machine code, maintaining the translations in a code cache. Thus parts of its design will be great for dynamic and adaptive code generation. QEMU works by translating guest code into QEMU IR, which is then passed to a target code generator. That code generator does a useful level of local register allocation and local optimizations (hence the Quick in Quick Emulator).

For our case, μVM IR would be the guest ISA and our target would the host ISA that QEMU targets. A part of how QEMU does things is that the guest register file is represented as a memory data structure and (to first approximation) the host registers are used for local register allocation. To finer approximation I believe QEMU can bind some guest registers to host registers, useful where the host register set is larger. For μVM IR the "registers" are the SSA values and we can probably get some mileage from the SSA form in telling the QEMU register allocator helpful things about when things "die", etc.

Downsides of this prototype approach include:

Dependence on QEMU
Code quality may be limited, particularly in that it will tend to push all variables to memory as it crosses from block to block
It is not clear what the situation is with QEMU and multithreading, though it may support concurrent execution

Upsides:

Fast path to getting native code going
Useful optimizations, especially for a dynamic environment
Back-end architecture design that we might want to use a guide in our own work
Already implements major platforms of interest

Eventual system design thought

Down the line we may want something like QEMU's code generation design, with more provision for (optional) higher levels of optimization and register allocation across larger scopes (that is, for whole functions, not just blocks/traces). But their internal IR, code generators, etc., might be a very useful base, as well as having a notion of the code cache design and so forth.

wks commented 10 years ago

My concerns are:

Is QEMU simpler than LLVM?
How does QEMU address memory management (especially garbage collection)? Specifically:
1. is there a way to mark some parts of data structures as "references"?
2. is there a way to mark part of the stack as "references"?

eliotmoss commented 9 years ago

In the QEMU approach it is as if we have a direct hardware machine that executes MicroVM IR. We were thinking along the lines of using register windows with a large number of registers (or variable size windows), one register for each local SSA variable. Thus, within a function the types of things would be obvious and with some liveness information you'd be all set for locating pointers. Put another way, the typing of the variables gives us the basis of very straightforward stack maps to use to drive GC. None of this has to do with QEMU itself -- it's merely an emulation engine for whatever we choose to emulate.

I am not 100% clear about what Lua does, but I can tell you that trying to run things where pointers are segregated from non-pointers will in the long run be overly constraining and a giant pain in the rear. If you're doing optimizing JIT you need a stack map plan and it's better to have that plan up front that to try to add it on later!

wks commented 9 years ago

For the Lua part, what I really proposed in the other issue is an interface between the Client and the uVM where handles to uVM values (like JNI handles), rather than concrete uVM numerical values or addresses, are used by the Client.

I was not proposing the Sable-like pointer-segregated object layout. My opinion on object layout is that it should be as close to the C struct layout defined by the platform ABI as possible.

eliotmoss commented 9 years ago

Do you mean that all communication / computation would be via handles? That sounds slow. Why is it helpful? Best - E

On November 21, 2014 6:43:34 PM EST, Kunshan Wang notifications@github.com wrote:

For the Lua part, what I really proposed in the other issue is an interface between the Client and the uVM where handles to uVM values (like JNI handles), rather than concrete uVM numerical values or addresses, are used by the Client.

I was not proposing the Sable-like pointer-segregated object layout. My opinion on object layout is that it should be as close to the C struct layout defined by the platform ABI as possible.

Reply to this email directly or view it on GitHub: https://github.com/microvm/microvm-meta/issues/19#issuecomment-64055730

Sent from my Android device with K-9 Mail. Please excuse my brevity.

wks commented 9 years ago

The main purpose is to simplify GC (especially to identify Client-held roots).

There are many reasons why the Client may hold uVM values including reference.

If the Client may hold references, there are several choices for the uVM:

Use naive reference counting. (no need to identify roots)
Expose raw pointers and conservatively scan Client stacks (how?).
Just before GC, call the Client to ask for a complete list of pointers to the uVM heap.
Like 2, also ask the Client to update their pointers in order to implement copying GC.
Record which objects are "exported" to the Client. Pin "exported" objects and expose raw pointers.
Record which objects are "exported" to the Client. Use an indirection table and expose table keys. This still allows "exported" objects to be moved.

The "handle" approach is generalised for 4 and 5. The "handle" can be anything so the implementation may choose 4 or 5. If it is 4, memory access via a "handle" can be a macro to a pointer operation (plus necessary read/write barriers, which may change in GC phases (like Sapphire)). If it is 5, there is one level of indirection.

Both 2 and 3 exposes the detail of tracing-based garbage collection to the Client. 1 requires the uVM to reach too deep into the Client. 0 is way too slow.

The API is designed for infrequent probing of uVM states. Currently it is inefficient to frequently use this interface to access the uVM memory. For example, it is inefficient to use the Client as an interpreter and operates on object in the uVM heap. (Steve also mentioned that the uVM is design as a (relatively) closed world of uVM IR code and the uVM heap, and is not designed to be frequently probed from the outside.)

I believe if frequent access is required, it is better to compile the part of code to uVM IR code and run on the uVM. I also suggested that John (our new Summer student who is interested in Python and Erlang) look at how to compile RPython to uVM IR code rather than hurrying to the tracing JIT Python compiler, because the PyPy Python interpreter is written in RPython. Although both Carl and I believe the tracing JIT is more profitable and have thought about using uVM only for optimised code, I still prefer the slower approach that investigates RPython and then PyPy JIT.

Regards, Kunshan

2014-11-21 23:55 GMT+00:00 Eliot Moss notifications@github.com:

Do you mean that all communication / computation would be via handles? That sounds slow. Why is it helpful? Best - E

On November 21, 2014 6:43:34 PM EST, Kunshan Wang notifications@github.com wrote:

For the Lua part, what I really proposed in the other issue is an interface between the Client and the uVM where handles to uVM values (like JNI handles), rather than concrete uVM numerical values or addresses, are used by the Client.

I was not proposing the Sable-like pointer-segregated object layout. My opinion on object layout is that it should be as close to the C struct layout defined by the platform ABI as possible.

Reply to this email directly or view it on GitHub: https://github.com/microvm/microvm-meta/issues/19#issuecomment-64055730

Sent from my Android device with K-9 Mail. Please excuse my brevity.

— Reply to this email directly or view it on GitHub.

wks commented 9 years ago

There is currently no way to do computations via the API (like asking the uVM to add two floats or int vectors).

All communications, as long as involving uVM values, will use handles. Here are some categories of Client-uVM communications that may use handles. NOTE: a handle holds ANY UVM VALUE, not just references.

Creating new uVM stacks and uVM threads. When creating stacks, the arguments to the stack-bottom function are expressed using a list of handles, each of which represents a uVM value. When creating a thread, it binds to a uVM stack. The "stack" type is a uVM type -- an opaque reference to a stack.
Accessing uVM memory. This include allocating in the heap (new, newhybrid), pointer arithmetic (getiref, getfieldiref, getelemiref, shiftiref, getfixedpartiref, getvarpartiref), reference type casting (refcast) and memory access (load, store, cmpxchg, atomicrmw) and memory fence (fence). They mirror the uVM instructions so they can be used by the Client.

TODO: There should be a better way to do pointer arithmetic and simplify those many API functions. Probably introduce something like the LLVM's "getelementptr".

Introspecting local variables in uVM stacks. Variables hold uVM values.
Constructing new stack frames. This action will need to supply the value of all local variables (at least all live local variable).

Another consideration other than helping GC is to bridge the gap of the different type systems between uVM and the Client.

ref, iref, weakref, func, thread, stack are obviously uVM-only transparent or opaque references and there is definitely no equivalent in the Client. For primitive types like int<1>, int<6>, int<8>, int<32>, int<52>, int<64>, int<256>, int<1024>, float, double, vector, the Client may not have equivalent types. Vector types are non-standard in C. Integers of 1, 6, 52 and other number of bits are unavailable in C. JavaScript and Lua do not have integer types. Python has all integer sizes available, but only one "int" type. Using handles can let the Client precisely express what uVM type/value it is using.

2014-11-21 23:55 GMT+00:00 Eliot Moss notifications@github.com:

Do you mean that all communication / computation would be via handles? That sounds slow. Why is it helpful? Best - E

On November 21, 2014 6:43:34 PM EST, Kunshan Wang < notifications@github.com> wrote:

For the Lua part, what I really proposed in the other issue is an interface between the Client and the uVM where handles to uVM values (like JNI handles), rather than concrete uVM numerical values or addresses, are used by the Client.

I was not proposing the Sable-like pointer-segregated object layout. My opinion on object layout is that it should be as close to the C struct layout defined by the platform ABI as possible.

Reply to this email directly or view it on GitHub: https://github.com/microvm/microvm-meta/issues/19#issuecomment-64055730

Sent from my Android device with K-9 Mail. Please excuse my brevity.

— Reply to this email directly or view it on GitHub https://github.com/microvm/microvm-meta/issues/19#issuecomment-64056743.

eliotmoss commented 9 years ago

Ok, I think I am understanding better now.

I would start with these interfaces as guides:

JNI JVMTI Jikes RVM "magics" (typesafe ones)

We will certainly need clarity as to the required support for reflection, particularly into the stack. Optimization can make variables "disappear", etc.

Regards -- EM

microvm / microvm-meta

Code generator prototyping strategies #19