execution model - Githubissues

hawkw commented 4 years ago

or, "how is wasm formed? how user get executable?"

are we...

...running a wasm VM in the kernel?

if so, does it really make sense to talk about "user space" and "kernel space" any more?
e.g. can we just run everything in ring 0 and have the "system calls" just be host fns the wasm VM calls into?
- there's almost certainly a perf benefit from eliminating CPU context switches this way...
- but there's also almost certainly a perf disadvantage from "running everything in a VM"
- what is the security model for this?
  - are we relying on the wasm VM for isolation? then we need to be able to trust it
  - or, do we use the wasm VM and the CPU's privilege rings for defence-in-depth? then we have to do system calls via interrupts like "normal people" do
- are we JITting code?

...pre-compiling code w/ lucet?

if we do the compilation in the kernel and trust the lucet we have in the kernel, can we stil run the lucet-compiled code in ring 0 to avoid context switches?
- is this a terrible idea
when does the compilation happen? do users invoke lucet or does the OS?
- if the OS is responsible for compiling, does that mean we recompile a wasm into a "real binary" every time a program runs? that seems Bad?
- does the kernel cache compiled "real binaries"?
  - if we rely on the wasm model for isolation, we would need to verify that the cache wasn't tampered with? how do we do that?
  - the kernel could sign code it compiles?
  - or we could implement the caching in a way that is inaccessible to userspace...but then you could tamper with the block storage device under a different OS.
  - maybe the kernel signs the whole cache somehow?
  - there's probably "some kind of SGX bullshit" we could use for this, maybe?

hawkw commented 4 years ago

@iximeow undoubtedly knows more about and/or has more opinions about this than i do

iximeow commented 4 years ago

has more opinions about this than i do

i extremely have opinions, hello

i generally don't like the idea of jitting particularly because i'd love system-wide w^x and jits generally don't do w^x. there's also Strange performance implications for jits since binary layout might change and if you run out of memory while jitting a function, the error mode is.. not good. "sorry, this function call failed because you're oom" is weird, but maybe less weird than i initially thought (stack overflows also can happen, and introduce the same error. hmm.)

i was thinking that AOT compiling and caching that would be very promising, particularly because there's a lot more flexibility post-compilation (and with ABIs!) when you're guaranteed to have something you can reason about and modify. vDSO-style hacks to make getpid fast could literally be an inlined constant! that's wild!

caching the artifacts from that compilation and parameters that were used wuold be a fair next step, and i'd want to sign those for the same error/tamper-detection reasons you mentioned. signing the whole cache at once wouldn't be great, but signing individual artifacts seems like it would work well as it wouldn't have a linearly-scaling penalty to add or remove objects.

FURTHER: if you particularly trust a program (eg if eliza wrote it) there's no reason you couldn't load it right into ring 0, trusting the code generation/wasm validation/optimizations to be correct. on the other hand if you don't trust the program (eg i wrote it), you could load it into a more restricted space with an explicit transition to do kernel tasks (spelled "ring 3" on x86, BUT this makes features like kpti, or smep, or other anti-speculation features also optional on trust levels! if codegen is deferred until load-time, you can just flip the 'this is scary please isolate as much as possible' knob to yes.)

the extremely problematic suggestion follows: if startup time is a concern, we could also have a tiered approach where a vm is used while a module is compiled iff the module is not already present with the right compilation flags in our code cache. (cranelift's takes its sweet time doing codegen right now, and will probably get better soon enough for this to not matter, but it would be cool)

hawkw commented 4 years ago

fwiw, I also think JITting is not great and would prefer to avoid. in general, while thinking through this i also thought that making the kernel solely responsible for managing the compilation of wasms into realcodes and for deciding what hardware protection ring to run it in was probably the best design; imo, the main downside is "it involves a whole buncha additional stuff we don't have to think about otherwise"

hawkw commented 4 years ago

if you particularly trust a program (eg if eliza wrote it)

you are making a big mistake :) :) :) :) :)

hawkw commented 4 years ago

signing individual artifacts

im assuming that we would need to use like "some kinda sgx bullshit" to generate the kernel signing key and keep it secure and i absolutely do not know about this stuff

hawkw commented 4 years ago

on the other hand if you don't trust the program (eg i wrote it), you could load it into a more restricted space with an explicit transition to do kernel tasks (spelled "ring 3" on x86, BUT this makes features like kpti, or smep, or other anti-speculation features also optional on trust levels! if codegen is deferred until load-time, you can just flip the 'this is scary please isolate as much as possible' knob to yes.)

strong agree that there should be (eventually) different modes for "untrusted" vs "less untrusted" (note i never said say "trusted" :wink:) programs and that it's fine to accept even significantly worse performance in "random executable i received in an email from a stranger" mode. presumably this would tie in with kernel-level permissions as well (e.g. code in ring 3 maximal distrust mode cannot spawn processes that are not also maximum distrust).

i am most interested in figuring out "what is execution model for applications i actually intend to run"; locked down modes for extremely untrusted code can be added easily on top of that imo

iximeow commented 4 years ago

"some kinda sgx bullshit" to generate the kernel signing key

For Maximal Security we could have the kernel whom holds the signing key encrypt it with a key derived from a passphrase or smth such that tampering would be obvious (and brick the system,,,,). i just kinda don't know sgx and it won't be present on a lot of systems so eeewwww

otherwise yes, 'just run programs and introduce trust levels later' seems like a good plan. i'm just excited

there is an important difference that will come up eventually in that x86 'syscalls' made from ring 0 need different entrypoints than ones from ring 3, so we'll probably want to just macroify wrappers or something.

hawkw commented 4 years ago

otherwise yes, 'just run programs and introduce trust levels later' seems like a good plan. i'm just excited

same! i just think a good starting place is "what is the model for applications i trust?" since that will be the scenario where i most care about things like "performance". in use-cases like "i'm poking at some software to determine if it's evil or not", i think it's reasonable to assume willingless to take arbitrary perf hits — people run untrusted software in VMs etc all the time.

hawkw commented 4 years ago

people run untrusted software in VMs etc all the time.

(of course, people also run trusted software in VMs all the time...i just think they shouldn't have to)

hawkw commented 4 years ago

So, a thought: it definitely seems like having the kernel manage AOT compilation of wasms is the correct approach. However, it also has perhaps the most moving parts, and introduces some more subsystems (managing the compiled artefact cache + ensuring its integrity).

Would it be worthwhile to consider starting with a wasm interpreter to run wasms & adding in AOT compilation transparently? That way, we can start "running code" sooner & iterate on the userland as well. I guess my question is whether this would make our lives easier or harder?

mystor commented 4 years ago

A well-designed interface should make it easy-enough to switch between different wasm backends. It should be reasonable to start with an interpreter, and move on from there. That should help with getting pieces put together before we get AOT compilation, caching, etc. fully set-up.

hawkw commented 4 years ago

Yeah, and there may also be a use-case for both, as well — e.g., if I'm running mycelium and I want to, say, test some code I'm writing, I might want to take the perf hit of the wasm interpreter in exchange for not having to spend extra time compiling my wasm into realcode. On the other hand, when the OS is fully featured enough to support doing software development on it (read: "nebulous future"), that wasm interpreter could also be a userspace thing...

iximeow commented 4 years ago

However, it also has perhaps the most moving parts, and introduces some more subsystems (managing the compiled artefact cache + ensuring its integrity).

we can totally just AOT every time we load a wasm and worry about caching later. the "AOT" bit can just write to memory and we can copy that somewhere appropriate in the kernel and run it

this reminds me that lucet has some opinions about relocations (namely that they can be applied) so maybe using wasmtime to interpret modules would be easier to get running.

i have no idea what the landscape of wasm interpreters is, whatever's there it probably would involve even fewer moving pieces to start with.

between AOT or JIT, we can probably have the same interface from the kernel, so :woman_shrugging:

hawkw commented 4 years ago

between AOT or JIT, we can probably have the same interface from the kernel, so 🤷‍♀

yeah, that was what I thought, we can swap the execution model out transparently.

we can totally just AOT every time we load a wasm and worry about caching later. the "AOT" bit can just write to memory and we can copy that somewhere appropriate in the kernel and run it

👍

mystor commented 4 years ago

The simplest wasm interpreter I know of is probably watt's. It's pretty allocation-heavy, has a weird API, and contains no unsafe code. If it's easy enough to use a proper interpreter like wasmtime's, we should go for it, but we might be able to hack something small together with watt's interpreter relatively easily. I think we'd only need Vec (+ String), Rc, and (maybe) HashMap.

hawkw / mycelium

execution model #4

...running a wasm VM in the kernel?

...pre-compiling code w/ lucet?