Overview

Rationale

Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechanisms to interface with those libraries.

This is part of the (unsafe) native interface. See super issue: https://github.com/microvm/microvm-meta/issues/24

Exposing appropriate Mu functions as C-style function pointers

"Appropriate" Mu functions must only use the following types as their parameter types or return types: int<n>, float, double, vector<T>, ptr<T> or struct types whose components are these types. In the case of ptr<T>, T can also be array<T n> or hybrid<F V> where T, F and V are one of the above types. In other words, (traced) references and Mu-specific opaque types are not allowed.

The Mu ABI will be designed to be compatible with the C calling convention as defined by the platform ABI.

way 1: (simple) Mu functions are declared with the optional WITH_FP clauses to create their associated C-style function pointers. For example:

.funcdecl @some_func WITH_FP(@fp_some_func DEFAULT @COOKIE) <@sig>

.funcdef @other_func VERSION @other_func_v1 WITH_FP(@fp_other_func DEFAULT @COOKIE) <@sig2> WITH_FP @fp_other_func (%param0) {
  ...
}

With the above definitions, @some_func has type func<@sig>, which is a Mu function reference value. @fp_some_func has type funcptr<@sig>, which is a C-style function pointer. Similarly @other_func is a func<@sig2>, while @fp_other_func is a funcptr<@sig2>. DEFAULT is the calling convention. @COOKIE is a "cookie" (see way 2 below).

The Mu IR program or the API can pass the function pointer to the native program. When called, the Mu function will run and return its return value to the native caller.

pros:
1. simple
2. The native funcptr is immediately available after loading the Mu bundle.
cons: does not support "closures" well. Some languages/implementations (e.g. LuaJIT) would like to expose closures (rather than just functions) to C as callbacks.

way 2: (complex) Mu functions are exposed with a run-time invocation of a Mu instruction or a Mu API message.

Format:

Instruction: fp = EXPOSE_MU_FUNC < sig > mufunc cookie
API: fpHandle = ca.exposeMuFunc( hMuFunc, hCookie )

The resulting fp has type funcptr<sig> and can be called from C. A function can be exposed multiple times, and the resulting function pointers are mutually inequal. The cookie is an int<64> value associated to the resulting function pointer. If a Mu function is called through a particular function pointer, a special instruction NATIVE_COOKIE will return the associated cookie value.

Example:

%fp1 = EXPOSE_MU_FUNC <@sig> @some_func @some_int64_value
%fp2 = EXPOSE_MU_FUNC <@sig> @some_func @other_int64_value
...
UNEXPOSE_MU_FUNC %fp1
UNEXPOSE_MU_FUNC %fp2

  // in @some_func
  %cookie = NATIVE_COOKIE
  %eq = EQ <@i64> %cookie @some_int64_value
  ...

val hFP = ca.exposeMuFunc(hFunc, hSomeInt64Value)
...
ca.unexposeMuFunc(hFP)

Both %fp1 and %fp2 have type funcptr<@sig>. But if the Mu fucntion @some_func is called from C via %fp1, the NATIVE_COOKIE instruction will return @some_int64_value. If called via %fp2, then NATIVE_COOKIE returns @other_int64_value, instead.

pro: the cookie can be used to identify different closures and look up the contexts of the closures.
con:
1. Not as simple as way1.
2. Exposing a Mu function requires a Mu instruction or an API message. This makes "implementing the Mu client API directly as exposed Mu functions" difficult. (In this case, exposing a Mu function requires an API function, which is also an exposed Mu function.)
  Contexts necessary for Mu functions to run

Even if a Mu function is exposed to the native program as a functpr<sig>, some contexts must be set up so that the Mu function can make use of Mu-specific features. These include:

Thread-local garbage collection states: including thread-local allocation pools, and registering the thread for yielding as requested by the GC.
Stack context: Each Mu stack has an associated stack value (the opaque reference to the current stack). This is necessary for swap-stack.

Similar to the JNI's "attaching a native thread to the JVM", Mu will also require attaching Mu contexts to a native thread before any exposed Mu function pointers can be called.

If the native program is executed because some Mu program called the native function through the native interface (via CCALL), the context is already set up and the C program can safely call back to Mu.

Mixed native/Mu stacks

With the possibility of both C-to-Mu and Mu-to-C calling, a stack may have mixed C or Mu frames. It has some implications for stack introspection and exception handling. Possible approaches are:

Stack introspection cannot go deeper than the last contiguous Mu frame from the top. i.e. introspection is immediately unavailable when reached a native frame. Exceptions may not go into native frames. This approach has the weakest promise from Mu, and is thus the easiest.
Mu can skip non-Mu frames and unwind to other Mu frames underneath.
Stack introspection and stack unwinding caused by exceptions can go through frames which are supported by the native debugger. This is harder than the previous one, but still practicable.
Support non-standard frames (such as JavaScript frames of SpiderMonkey or V8). Too hard.

Details

Related works

Support arbitrary C signatures defined by C code:
- LuaJIT: It can expose any Lua functions to C as function pointers.
- .NET: Reverse P/Invoke.
- GHC: Can generate C stubs. Arbitrary C signatures.
- libffi: Can create "closures" which has a user-defined signature. When it is called, it calls another function with a libffi-defined signature, making all arguments available.
- python: The ctypes module is based on libffi, it can make any Python function a C-callable function (with user-defined signature).
- JNA: Based on libffi, it can create callback objects which implement the Callback or StdCallCallback interface. May need Java-side wrappers, but not C-side.
- jnr-ffi / jffi: A more extremist counterpart of libffi: Most stuff are implemented in Java, only calling C functions when necessary. It implements its stub generator in Java. It even has an x86/x64 assembler in Java.
- jffi is lower level, while jnr-ffi is a jffi-based counterpart of JNA.
Must use API-defined signature:
- Lua: The C function must have the int (*lua_CFunction)(lua_State *L) signature.
- JNI: May call any Java static or instance methods, but only through API functions (e.g. NativeType Call<type>Method(JNIEnv *env, jobject obj, jmethodID methodID, ...)). Wrappers may be created in C to workaround this.
- CPython, SpiderMonkey, V8, ...: Through API functions.

More abut closures

Many languages support closures. A related work LuaJIT exposes Lua closures rather than plain stateless functions. For example:

ffi.cdef[[
typedef int (__stdcall *WNDENUMPROC)(void *hwnd, intptr_t l);
int EnumWindows(WNDENUMPROC func, intptr_t l);
]]

function makeHandler(msg)
  local function f(hwnd, l)
    print(msg)
  end
end

myHandler1 = makeHandler("I see a window!")
myHandler2 = makeHandler("Wow! A window!")

local count = 0
ffi.C.EnumWindows(myHandler1, 0) -- Implicit conversion to a callback via function pointer argument.
ffi.C.EnumWindows(myHandler2, 0) -- Implicit conversion to a callback via function pointer argument.

Both myHandler1 and myHandler2 are closures (in fact, all user-visible Lua function values are closures). But they are likely to share the same underlying bytecode, or JIT-compiled machine code, or (in a hypothetic Mu implementation) Mu IR function. The tricky part is the native C function EnumWindows know nothing about the concept of "closure". It only knows function points. This is why it is necessary to "associate" some value to the exposed function pointer to identify the closures.

LuaJIT implements this by constructing a jump table. Each exposed function pointer occupies a "slot" in the table. Each slot records the current PC (or the slot index) and then jump to the common handler where register states are inspected. By known which slot it jumps from, it is possible to distinguish between different exposed function pointers and, thus, recover the closure context.

Mu differs from Lua that Mu does not directly support closures. But if the higher-level language upon Mu supports closures, Mu must provide necessary mechanism for it.

Other implications of such "c-to-mu" interface being available:

New ways to start new Mu threads: "Creating a new stack and then creating a new thread on it via the API" used to be the only way to start a Mu program. If a C thread (may be PThread) can "become" a Mu thread by "attaching to the µVM", then an easier way to start a Mu program is just calling an exposed Mu function from C (probably the main() function).

Example C-based Mu loader:

#include <mu.h>

int main(int argc, char** argv) {
  Mu *mu = mu_new_instance();
  mu_load_bundle(mu, read_file("some_bundle.uir"));
  mu_attach_current_pthread(mu);
  int (*mu_main)(int, char**) = mu_get_exposed_native_function(mu, "@main.native");
  return mu_main(argc, argv);
}

Previous approaches (via the Mu-Client API. The API still has the advantage of being language-neutral):

#!/usr/bin/env python3

import sys

import mu

vm = mu.connect_to_remote_micro_vm_instance("192.168.0.1", 8080)  # You can't just call a remote procedure locally.
vm.load_bundle(read_file("some_bundle.uir"))

func = vm.put_function("@main")
argc = vm.put_int(len(sys.args))
argv = vm.new_hybrid(len(sys.args))

... # populate argv here

stack = vm.new_stack(func, [argc, argv])
thread = vm.new_thread(stack)

There are other ways to do things in addition to the Mu-Client API: Things can be done via function calls to Mu functions. In this sense, part of the API can be implemented as pre-exposed Mu functions. Those Mu functions are first-class Mu citizens and can do anything in Mu.

For example, the following Mu function can be used to allocate char arrays, which used to be done by the new_hybrid message:

.typedef @char_array = hybrid <@i64 @i8>

.funcsig @my_new_hybrid_sig = @handle_t (@i64)
.funcdef @my_new_hybrid VERSION ... WITH_FP @my_new_hybrid.native (%len) {
  %entry:
    %ary = NEWHYBRID <@char_array @i64> %len
    STORE ... // store length to the fixed part
    %handle = CALL <...> @save_to_some_table (%ary)
    RET <@handle_t> %handle
}

This call-return interface is not allowed to pass (traced) references between the C-Mu boundary. But since the Mu Spec does not define handle as any particular type, hence simple data types, such as integers, can be returned to the C program to refer to this heap object later.

When working with the heap, it is probably easier to do things in the Mu IR than in C: For example, when copying a C char buffer to a Mu int<8> buffer in the heap, there are now two ways:

Pin the object so we get an address. Then do memcpy.
Write strcpy in Mu IR. It has access to the heap, and also has access to the C buffers and pointers via the (unsafe) native interface. C programs simply call this Mu function to get around all heap-related limitations. No more copying byte-by-byte through the (supposedly slow, but more general and safer) Mu-Client interface API messages.

Calling convention

Sometimes it is necessary to specify the calling convention (such as Windows API functions). It will be necessary that the exposed function pointer is supposed to use the appropriate calling convention, too.

.funcdef @foo VERSION ... HAS_NATIVE(@foo.def @CONSTI64_COOKIE1 DEFAULT) HAS_NATIVE(@foo.stdcall @CONSTI64_COOKIE2 STDCALL) <@sig> (%arg) { body }

%fp1 = EXPOSE_MU_FUNC <@sig> @foo %cookie1 DEFAULT
%fp2 = EXPOSE_MU_FUNC <@sig> @foo %cookie2 STDCALL

p.s. Mu functions still use internal calling conventions when calling each other. That is hidden from (and in fact unrelated to) the native world.

Can this mu->native/native->mu interface implement JNI without also requiring Mu to implement C++-style native stack unwinding?

JNI allows C programs to handle Java exceptions via ExceptionOccured(), ExceptionClear() and other functions. It also allows throwing exceptions to Java via Throw() or ThrowNew() functions.

It should be sufficient to only support simple native-to-mu and mu-to-native function calls in order to support this JNI-style mixed native-managed exception handling.

The main reason is that JNI is not "light-weight" or "simple". There can be an intermediate layer between Java and C in JNI which handles all the exception-related things, such as how a Java exception can pass through native code and reach the next Java frame below. In other words, Mu can provide a minimum interface while the client provides much more.

Take Call<Type>Method for example. Instead of directly exposing a Mu function for C to call, the intermediate layer (JNI) exposes a wrapper function which catches all Mu exceptions above it. Then it stores the exception somewhere the JNI function ExceptionOccured() can get. ExceptionOccured() can be a directly exposed Mu function that returns the saved exception.

As another example, when calling from Java to C, Mu-based JNI never calls the C function directly, but use a wrapper which, after the C function returns, restore the result of ExceptionOccured(), and throw to Mu (using an ordinary THROW instruction in Mu).

Should Mu exceptions go though C frames if not caught above the C frames?

It should be easy to brutally destroy C frames (without respecting C++-level destructors. Sorry, C++ and embedded SpiderMonkey or V8). The worst consequence is the CCALL instruction has to save callee-saved registers if it does not trust the native program. Mu also needs to link the two Mu frames around the native frames in order to "jump over" all native frames. But this should be the least expensive part of the native interaction.

If the native program is required to be aware of Mu exceptions, then there should be wrappers, but I assume it will be rare. Though the Itanium C++ exception ABI is "zero-cost", cross-language calls are already non-zero-cost. I don't think cross-language exceptions can be.

However, I haven't found existing runtimes that do such "brute-force" unwinding. (See below) LuaJIT claims that it may do this, but I have not observed such behaviour.

Can OSR pop out C frames, too?

As long as it is brutal, it should be easy. It can use the same mechanism as exception handling to unwind the stack.

Existing languages/runtimes

JVM: JNI: Throw() and ThrowNew() return 0 on success, but does not unwind the stack. Java exceptions from Call<type>Method will remain there and pass through C back to lower Java frames.

CPython: ctypes: Basically the same approach as JNI. Python exceptions are stored in a global variable (like GIL). If Python calls C calls Python, and Python raises an exception, then it does not unwind the stack, but return to C as if it just returned (with a garbage return value). The control flow goes on in C until returning to Python, when the exception in the global variable is discovered by the CPython interpreter.

In the previous two approaches, C++ destructors are still executed because it does not consider any Java/Python exceptions as thrown, and stack unwinding never occurred. They pretend they are in the normal control flow (I think this may be dangerous because errors silently flows into C++ as a meaningless garbage return value).

LuaJIT: The documentation mentioned the possibility of forced stack unwinding, but when I tried, if Lua calls C calls Lua, and Lua calls error("msg"), then C++ destructors are still executed.

JVM: JNA (Java Native Access): When Java calls C calls Java, and Java throws an exception, then C receives 0 as the return value and continues normally. JNA logs the exception thrown from Java, but is not propagated back to the lower Java frame.

JVM: JNR (Java Native Runtime): Like JNA, JNR does not unwind the stack, but returns 0 to C and continues normally. But JNR propagates the Java exception to the lower Java frame after returning from the native function.

According to JNA's documentation: "A callback (the upper Java frame, written in Java, called by C) should generally never throw an exception, since it doesn't necessarily have an encompassing Java environment to catch it. Any exceptions thrown will be passed to the default callback exception handler."

Julia: Copies the whole stack and (ab)uses setjmp/longjmp to set the stack pointer.

microvm / microvm-meta

Call-back from native to Mu #39