oxidecomputer / helios

Helios: Or, a Vision in a Dream. A Fragment.
Mozilla Public License 2.0
371 stars 10 forks source link

wasmtime slow due to lock contention in kernel during munmap() #177

Open sunshowers opened 3 weeks ago

sunshowers commented 3 weeks ago

Encountered an interesting bug while trying to port wasmtime to illumos, as documented in https://github.com/bytecodealliance/wasmtime/pull/9535.

STR (note +beta, compiling wasmtime on illumos requires Rust 1.83 or above):

git clone https://github.com/bytecodealliance/wasmtime
cd wasmtime
git checkout 44da05665466edb301558aa617d9a7bff295c461
git submodule init
git submodule update --recursive

cargo +beta test --test wast -- --test-threads 1 Cranelift/pooling/tests/spec_testsuite/load.wast

This takes around 0.07 seconds on Linux but around 5-6 seconds on illumos.

DTrace samples:

From my naive reading of particularly the kernel stacks, it seems like most of the time is being spent waiting on locks to various degrees.

Per Alex Crichton in this comment:

Whoa! It looks like the pooling allocator is the part that's slow here and that, by default, has a large number of virtual memory mappings associated with it. For example it'll allocate terabytes of virtual memory and then within that giant chunk it'll slice up roughly 10_000 linear memories (each with guard regions between them). These are prepared with a MemoryImageSlot each.

My guess is that the way things are managed is tuned to "this is acceptable due to some fast path in Linux we're hidding" which we didn't really design for and just happened to run across.

This corresponds to PoolingInstanceAllocator in wasmtime. Alex suggests possibly tweaking how the allocator works either on illumos or generally, but given the performance difference between illumos and Linux it seems that a kernel-level improvement might help.

cc @iximeow, @rmustacc who I briefly chatted with about this.

jclulow commented 3 weeks ago

We did a bit of digging and it seems like the crux of the matter is that wasmtime is making an extremely large mapping to reserve address space, so that other things don't end up surprise-mapped half way through one of the slots that is set aside for a potential WASM instance. Today it appears that mapping is MAP_ANON | MAP_NORESERVE with PROT_NONE. It would probably be cheaper/faster to open /dev/null and use MAP_FILE | MAP_SHARED | MAP_NORESERVE, still with PROT_NONE, instead. I think it would otherwise be functionally compatible with what's already going on, in that you would have to have already adjusted (a subset of) the mappings to make them useful (as they were PROT_NONE anyway) before touching the pages.

I would also note, if useful, that while we strongly encourage people not to depend on the Linux-specific MADV_DONTNEED behaviour, we appear to have a compatible implementation under a separate name: MADV_PURGE.

sunshowers commented 3 weeks ago

It would probably be cheaper/faster to open /dev/null and use MAP_FILE | MAP_SHARED | MAP_NORESERVE, still with PROT_NONE

I gave this a shot but ran into the fact that wasmtime doesn't really centralize its memory allocation tracking anywhere -- several independent components all call mmap directly. AFAICT not great design on wasmtime's part but also a fairly big lift to fix.

https://github.com/sunshowers/wasmtime/pull/1 is my WIP (see mmap.rs for the /dev/null + interval tree-based impl, which is the only meaty part -- the rest is just type tetris). If we do want to invest in this, it would probably be good to establish the performance benefits of the dev/null-based mmap using my impl before continuing.

sunshowers commented 3 weeks ago

Ended up deciding to use unsafe code to store a *const Mmap, and it works and is extremely fast! Wow. This is far from shippable but is definitely a prototype worth presenting upstream.

https://github.com/sunshowers/wasmtime/pull/1

sunshowers commented 3 weeks ago

Filed an issue on the wasmtime repo talking about my prototype, the issues I ran into while writing it, and possible solutions: bytecodealliance/wasmtime#9544