wasmerio / wasmer

🚀 The leading Wasm Runtime supporting WASIX, WASI and Emscripten
https://wasmer.io
MIT License
18.64k stars 798 forks source link

Loading NativeArtifact can be faster #2180

Closed ailisp closed 3 years ago

ailisp commented 3 years ago

Thanks for proposing a new feature!

Motivation

Currently loading a NativeArtifact is two step:

  1. load ELF, internally it's a mmap, very fast
  2. deserialize ModuleMetadata from part of ELF data, slow

A simple print instant::now/elapsed() show the total loading takes 90% (3.5ms of 4ms) of the entire NativeArtifact::deserialization_from_file_unchecked, and longer when wasm file is bigger. So this is suboptimal

Proposed solution

ModuleMetadata can also be replaced with a mem mapped data structure, as it's consist of struct or struct of vectors (primarymap). For example, I tried this lib: https://github.com/heremaps/flatdata/tree/master/flatdata-rs and loads a artificially constructed, similar structured and similar sized to wasmer Artifact, it takes only 0.2ms to load, and doesn't take longer when I enlarge size of vector in struct to 100x.

Alternatives

Other mmap based crate maybe considerered

Additional context

There is some HashMap and IndexMap in ModuleInfo, however they are all less than 1000 elements in a 2M contract (lib/tests/assets/qjs.wasm), which should be ok by replace with a sorted vector and lookup via binary search is faster than hashmap in this number of elements

Hywan commented 3 years ago

Thank you for the proposal. That's an interesting thing to try.

For the moment, we use libloading to load the library. Behind the scene, it calls dlopen to open the library, with the flags RTLD_LAZY | RTLD_LOCAL. Maybe we can do better. I know RTLD_NOLOAD and RTLD_NODELETE can improve things quite a lot but not sure it will impact our generated shared library objects as they don't have dependencies. Need to test!

Are you willing to try something with a PR :-)?

syrusakbary commented 3 years ago

@Hywan I believe @ailisp is mentioning not dlopen loading time issues, but bincode deserialization of ModuleMetadata when the artifact is partially loaded (already dlopen-ed).

@ailisp, I think your suggestion is really great. Once the PR #2183 is merged it should be really trivial to add a much faster deserialization mechanism. I'm happy with using flatdata-rs that you commented. But I'm also curious how it will perform also with borsh-rs.

ailisp commented 3 years ago

@Hywan I mean @syrusakbary said. Current dllopen is fast enough but thanks for giving a lot more insight in dllopen and that's quite advanced :)

@syrusakbary flatdata (or other mmap based, raw pointer based) are usually a single operation to reconstruct the entire struct, which is significantly faster than borsh. Borsh is 30%-40% faster than bincode in deserialize Artifact of qjs.wasm, but it still have to iterate all elements to reconstruct PrimaryMap/Vectors.

ailisp commented 3 years ago

Some update on this: flatdata actually doesn't support Vec<Vec<_>> kind of struct, my above benchmark is on CompiledFunctionFrameInfo which is rather flat, but for the entire struct there is PrimaryMap<String,>> which is considered Vec<Vec<_>> is ram. So we found and benched another, zero cost loading crate https://github.com/djkoloski/rkyv, which also gives 0.2ms load time in a 100x enlarged struct (borsh takes 140ms, bincode is even longer), and this crate support HashMap. It's going to need implement rkyv traits for IndexMap and PrimaryMap, are you interested and okay with it?

Hywan commented 3 years ago

Closed by https://github.com/wasmerio/wasmer/pull/2190.