Open sunshowers opened 3 weeks ago
We did a bit of digging and it seems like the crux of the matter is that wasmtime is making an extremely large mapping to reserve address space, so that other things don't end up surprise-mapped half way through one of the slots that is set aside for a potential WASM instance. Today it appears that mapping is MAP_ANON | MAP_NORESERVE
with PROT_NONE
. It would probably be cheaper/faster to open /dev/null
and use MAP_FILE | MAP_SHARED | MAP_NORESERVE
, still with PROT_NONE
, instead. I think it would otherwise be functionally compatible with what's already going on, in that you would have to have already adjusted (a subset of) the mappings to make them useful (as they were PROT_NONE
anyway) before touching the pages.
I would also note, if useful, that while we strongly encourage people not to depend on the Linux-specific MADV_DONTNEED
behaviour, we appear to have a compatible implementation under a separate name: MADV_PURGE
.
It would probably be cheaper/faster to open /dev/null and use MAP_FILE | MAP_SHARED | MAP_NORESERVE, still with PROT_NONE
I gave this a shot but ran into the fact that wasmtime doesn't really centralize its memory allocation tracking anywhere -- several independent components all call mmap
directly. AFAICT not great design on wasmtime's part but also a fairly big lift to fix.
https://github.com/sunshowers/wasmtime/pull/1 is my WIP (see mmap.rs
for the /dev/null
+ interval tree-based impl, which is the only meaty part -- the rest is just type tetris). If we do want to invest in this, it would probably be good to establish the performance benefits of the dev/null-based mmap using my impl before continuing.
Ended up deciding to use unsafe code to store a *const Mmap
, and it works and is extremely fast! Wow. This is far from shippable but is definitely a prototype worth presenting upstream.
Filed an issue on the wasmtime repo talking about my prototype, the issues I ran into while writing it, and possible solutions: bytecodealliance/wasmtime#9544
Encountered an interesting bug while trying to port wasmtime to illumos, as documented in https://github.com/bytecodealliance/wasmtime/pull/9535.
STR (note
+beta
, compiling wasmtime on illumos requires Rust 1.83 or above):This takes around 0.07 seconds on Linux but around 5-6 seconds on illumos.
DTrace samples:
From my naive reading of particularly the kernel stacks, it seems like most of the time is being spent waiting on locks to various degrees.
Per Alex Crichton in this comment:
This corresponds to
PoolingInstanceAllocator
in wasmtime. Alex suggests possibly tweaking how the allocator works either on illumos or generally, but given the performance difference between illumos and Linux it seems that a kernel-level improvement might help.cc @iximeow, @rmustacc who I briefly chatted with about this.