Closed bjorn3 closed 2 years ago
By the way when trying to verify the "Lightweight headless mode" claim, I'm getting 4.8MB (2.8MB with thin LTO) for the engine-headless example adapted to skip the compilation step and then compiling with cargo build --no-default-features --example engine-headless --release --features "wasmer/sys wasmer-workspace/wasmer-engine-dylib"
.
I'm getting 5.2MB (3.4MB with LTO) for the serialize example adapted to skip the compilation step and then compiling with cargo run --example serialize --no-default-features --release
.
While Wasmer produces smaller headless binaries, the difference is significantly smaller than the comparison page claims. I suspect that it has never been updated for the added headless support of Wasmtime. I also noticed that the headless binary I produced using Wasmer was almost 3x as big as the claimed size. I'm not sure why this is the case.
(Note: Everything was compiled using the same rustc version (rustc 1.61.0 (fe5b13d68 2022-05-18)
)
Thanks for opening the issue!
The startup speed was measured with Wasmtime 0.2X I believe (it was about a year and a half ago so unfortunately I don't really remember the exact version used).
My best guess is that it compares pre-compiled object files for Wasmer with just in time compiled code for Wasmtime. This would not be a fair comparison given that Wasmtime also supports just in time compiled code.
Indeed, that would not be fair. To give you more context, let me confirm that we did compared just startup speed, in an apples-to-apples comparison. In general, both wasmer and wasmtime cached the compiled objects at the time of testing (which I assume is stills holds true today).
We measured something similar to the following (note that each command was run twice, to let the runtime cache the artifact and not need to recompile it again):
wasmer run xyz.wasm --llvm --native # today it would have been --llvm --dylib
wasmtime run xyz.wasm
In the case of wasmer, the startup difference was mainly caused by using the native dlopen
under the hood vs the custom artifact format that wasmtime used (similar strategy that Lucet used to do, and why Lucet was way faster as starting vs wasmtime, up to par with Wasmer). wasmtime's strategy was way slower to load at the time of measurement (not sure about the latest one, since we haven't measured it again).
By the way when trying to verify the "Lightweight headless mode" claim
About the headless mode. Wasmer headless with just the native/dylib engine was just 800Kb at the time of measurement.
Right now, the "non-optimized version" of Wasmer headless is 1.6Mb (you can download it from https://github.com/wasmerio/wasmer/releases/download/2.3.0/wasmer-linux-amd64.tar.gz in bin/wasmer-headless
, or by running make build-wasmer-headless-minimal
in the makefile). Note that the file size difference (1.6Mb vs 800Kb) is due to having to include the custom "universal" engine. Once that engine is not included, sizes should be in the order of Kbs (~800Kb).
For Wasmtime's, they didn't provide any headless binary nor they supported the --no-default-features
that you used today, so we just simply used what was available at the time.
Hope this clarifies your questions! Closing the ticket :)
Thanks for the reply.
Indeed, that would not be fair. To give you more context, let me confirm that we did compared just startup speed, in an apples-to-apples comparison.
:+1:
We measured something similar to the following (note that each command was run twice, to let the runtime cache the artifact and not need to recompile it again):
Was this a small or a big wasm module? And do you happen to remember which wasm module was used exactly? I want to try and see how much wasmtime has improved since.
For Wasmtime's, they didn't provide any headless binary nor they supported the --no-default-features that you used today, so we just simply used what was available at the time.
Makes sense.
Would you accept a PR updating the results at https://wasmer.io/wasmer-vs-wasmtime for the latest Wasmer and Wasmtime versions? I will mention the exact benchmarks I have used in that case.
I'd love to see the page updated with benchmarks used and more information. Sticking 2x speed and 1000x startup without any context is a little vague and difficult to believe.
https://wasmer.io/wasmer-vs-wasmtime lists a couple of claims about why wasmer is better than wasmtime. The "Flexible compiler support" and "Favorite language integration" claims are easy to verify as true. However for the "Startup speed" and "Execution speed" are not verifiable without pointing to the benchmarks that gave those result. The "Execution speed" claim I can believe when comparing LLVM for Wasmer with Cranelift for Wasmtime. However the "Startup speed" claim claims such a huge perf difference that I want to see the benchmark on which it was based to verify this for myself. I have a feeling like it is an apples vs pears comparison. My best guess is that it compares pre-compiled object files for Wasmer with just in time compiled code for Wasmtime. This would not be a fair comparison given that Wasmtime also supports just in time compiled code.