Dynamic memory allocation and sandboxing

mbillingr commented 7 years ago

Hi there,

thank you for sharing your WebAssembly interpreter. I'm planning to embed WebAssembly in a C++ program so your implementation seems like an excellent starting point.

If I understood the approach correctly, you directly expose C symbols as host imports to the WebAssembly client. This is great for quickly exposing third party libraries like SDL, but what about commands like malloc? Won't these allocate memory in the host's address space rather than the client's linear memory? This does not go well with my idea of isolating the client from the host :)

To completely sandbox the client, I guess it would be necessary to wrap any exported host functions, mapping client to host pointers, and write custom memory management routines.

Do you have any thoughts on this topic? Do you have further plans with this project, and would you be interested in pull requests in case my fiddling produces something useful?

Cheers, Martin

kanaka commented 7 years ago

To completely sandbox the client, I guess it would be necessary to wrap any exported host functions, mapping client to host pointers, and write custom memory management routines.

That's correct. Here is some more detailed background:

There are two different top level programs (wac and wace) that setup memory/interfaces differently.

wac sets things up like a normal WebAssembly (WA) in a JavaScript context. If the WA code needs hope memory, then the host needs to alloc that memory and pass it in as an import. The WA code itself then needs to implement memory management. If you want to call imported functions, then those external functions will need to be aware of the memory space of the WA code and agree on the memory layout/management with the WA code. WA code running in wac has bounds checked memory reads/writes so it won't be able to access memory outside the imported memory (don't count on it for security, there are almost certainly ways around it). This means that any pointers (even strings) passed from the host environment into the WA code will not be accessible unless the host-side has mechanisms to allocate memory from the WA memory and then translate every pointer into the offset into the WA memory for every call into the WA code.

However, wac isn't really designed to run C code that was compiled to WebAssembly. That's where wace comes in. wace is designed to run C code compiled by emscripten (specifically `emcc -O2 -s WASM=1 -s SIDE_MODULE=1 -s LEGALIZE_JS_FFI=0). wace allocates memory for the WA code, but instead of passing that address in the memory import, it sets memory to point to 0 and sets memoryBase to point to the WA owned memory. This means that the WA code can use host memory pointers without translation. This means that normal C code (even code that calls back out to host functions) can be compiled with emscripten and run under wace. You can call host malloc and then use the memory, you can call host functions using pointer into host or WA memory, etc. The downside of course is that WA code running under wace is completely unsandboxed. Whatever the original C code could do, the WA compiled code can also do when run in wace.

So if you are wanting to sandbox code that is intended to run under WebAssembly (as opposed to arbitrary C code compiled to WebAssembly), then you would probably want to target wac. As you noted, you will have to do some sort of WA/host pointer translation to give WA access to the host. And if you are wanting to do this for security reasons, you probably will want to do a fairly in depth audit and testing (maybe with code fuzzing) to give confidence. The goal of wac/wace is to be a minimal C interpreter for WebAssembly with good host interop (i.e. security is not a focus/goal). However, that being said, I would be happy to take pull requests that don't balloon the core implementation. I.e. if you have a new front-end driver (sibling to wac/wace) to do what you want need plus some patches to the core to support it, I would be happy to consider merging it.

mbillingr commented 7 years ago

Thank you for the detailed explanation - things are starting to make more sense to me now.

Since emscripten appears to be the easiest way to build wasm code, I've looked primarily into wace which exports the relevant stuff. However, it seems like wac will be more relevant for me.

My goal is to run WebAssembly as sandboxed as possible, without caring if it was compiled from C, Rust, or written by hand. Ideally, the front end would only expose a generic interface through syscalls - I think this is also where emscripten is heading. For now, I guess, my best strategy will be to start with wac and implement the interface required by emscripten builds. Security is only a minor concern as this is a hobby project and there is only so much spare time :)

mbillingr commented 7 years ago

In addition to the exports in #3 I'm going to do some more profound changes to the core to support freeing modules, not killing the main process on errors, and maybe running multiple WebAssembly instances.

I realize that these may not fit into the scope of wac or may cause unacceptable bloat so I'll create separate pull requests for these features and you can decide if you want them. Otherwise they'll live in my fork.

kanaka / wac

Dynamic memory allocation and sandboxing #2