Closed almann closed 2 years ago
I've added preliminary benchmark integrations in this commit.
The approach is basically what I posited in the TODO above--I cross compile a small C-ABI shim crate targeting wasm32-unknown-unknown
into a cdylib
and then embed the compiled WASM module into the executable. At runtime, the benchmark compiles the WASM module and copies the benchmark string into the instance's linear memory, and then proceeds to benchmark calling into the appropriate shim function through the WASM runtime's interface (which also has the minor downside of benchmarking the overhead of calling a function through the WASM runtime). This is good enough to make sure that the relative usage of SIMD is actually doing something and to measure the relative speed up on a given WASM runtime (mileage may vary in different VMs on different host architectures).
On my NUC (Intel i7-10710U running Ubuntu 20.04 with performance governor on) I got some promising benchmarks that demonstrate that the WASM SIMD is definitely performing better.
$ for X in std compat basic; do (cargo bench --features=simdutf8_wasm --bench="throughput_wasm_$X" -- --save-baseline wasm-$X); done
Do you want me to add this feature to this PR, or do you want me to break them up (e.g., have this PR stand on its own, and have follow up PRs and make the task list above issues)? I am very sympathetic large PRs so please let me know how you'd like me to proceed.
First of all: Great work, thank you so much! I will do a first review shortly.
Do you want me to add this feature to this PR, or do you want me to break them up (e.g., have this PR stand on its own, and have follow up PRs and make the task list above issues)? I am very sympathetic large PRs so please let me know how you'd like me to proceed.
You can put it in this PR, just don't rebase for now, so I know which commits I have already looked at.
You can put it in this PR, just don't rebase for now, so I know which commits I have already looked at.
Excellent, I have pushed that commit to the PR.
What I am not 100% clear on is, if we want to provide a knob so that the library user can turn on the simd implementation at runtime without having to compile with the
+simd128
target feature, because for browsers there is wasm-feature-detect which could be used to check for simd availability. What do you think?
My current understanding (which could be wrong) of WASM ISA extensions such as SIMD is is that there is no way within WASM for us to detect this, so the code targeting WASM itself cannot detect SIMD or not--in the host system (e.g., a Javascript host) there may or may not be such a facility, but it is not clear to me how we could reliably interact with it from the library in a portable (as in non-JS engines) way.
I do think from the library's perspective, we should definitely allow static flexibility at a minimum because ultimately we want to be able target any WASM runtime irrespective of its native capabilities.
Added the CI including inlining tests, I had it configured in another branch to run with that branch, here was a run of the workflow:
Looking good! It will be a few days until I have time to review the rest though.
@almann Are you sure there is no way around the shim for benchmarking? It looks like there is some basic support for wasi benchmarking in the criterion master branch: https://github.com/bheisler/criterion.rs/issues/461#issuecomment-886634148.
@almann Are you sure there is no way around the shim for benchmarking? It looks like there is some basic support for wasi benchmarking in the criterion master branch: bheisler/criterion.rs#461 (comment).
Diving into that issue and the associated PR in the downstream dependency (plotters
), is that both are still open (and I think I saw this when I was trying to target the benchmarks to wasm32-wasi
). The problem I had run into against released criterion.rs was that the downstream dependency for wasm32
targetted code assumes wasm-pack
which leads to a bunch of undefined symbols. Looking more closely at the PR files, the patch looks pretty trivial and explains my experience, with the build.
@hkratz, did I maybe miss something here?
I also added Wasmtime support to the benchmarks in https://github.com/rusticstuff/simdutf8/pull/56/commits/48eacc43460074f7b6fe2b6a00147bbe6fa0ab93.
I found actually benchmarks on the different WASM runtimes to be of interest, so assuming we keep the shim approach, I think these benchmarks are useful and a user can select whichever one (or all of them) as they see fit.
For example here is the comparison of Wasmer Cranelift/LLVM back-ends vs. Wasmtime (which only has a Cranelift backend) for the basic validator (on Ubuntu 20.04, Ryzen 3950X [Zen2]):
The caveat of the shim benchmarks apply, but it does suggest interesting differences between runtime and this code.
Diving into that issue and the associated PR in the downstream dependency (
plotters
), is that both are still open (and I think I saw this when I was trying to target the benchmarks towasm32-wasi
). The problem I had run into against released criterion.rs was that the downstream dependency forwasm32
targetted code assumeswasm-pack
which leads to a bunch of undefined symbols. Looking more closely at the PR files, the patch looks pretty trivial and explains my experience, with the build.@hkratz, did I maybe miss something here?
FWIW, I tried playing around with this. I can definitely patch around plotters
(check it out in a submodule, patch it with the PR above and make sure the version lines up), but I run into problems with rayon
needing a thread-pool. A comment in the PR above alludes to rayon
/plotters
being optional, but I see no such evidence of this in criterion
's TOML file:
RUSTFLAGS="-C target-feature=+simd128" CARGO_TARGET_WASM32_WASI_RUNNER="wasm-runner wasmer" cargo bench --no-default-features --target=wasm32-wasi --bench=throughput_basic
...
thread 'main' panicked at 'The global thread pool has not been initialized.: ThreadPoolBuildError { kind: IOError(Error { kind: Unsupported, message: "operation not supported on this platform" }) }', /home/almann/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:170:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: failed to run `/home/almann/CLionProjects/simdutf8/bench/target/wasm32-wasi/release/deps/throughput_basic-716ee0bc7dd8261c.wasm`
│ 1: RuntimeError: unreachable
at __rust_start_panic (throughput_basic-716ee0bc7dd8261c.wasm[2750]:0x1dd651)
at rust_panic (throughput_basic-716ee0bc7dd8261c.wasm[2728]:0x1dcd76)
at std::panicking::rust_panic_with_hook::heefe90fce7240a03 (throughput_basic-716ee0bc7dd8261c.wasm[2721]:0x1dc9e3)
at std::panicking::begin_panic_handler::{{closure}}::h4f5bcef500c8e46d (throughput_basic-716ee0bc7dd8261c.wasm[2702]:0x1dbab7)
at std::sys_common::backtrace::__rust_end_short_backtrace::h53dbd65c57a9d0e6 (throughput_basic-716ee0bc7dd8261c.wasm[2701]:0x1dba06)
at rust_begin_unwind (throughput_basic-716ee0bc7dd8261c.wasm[2720]:0x1dc531)
at core::panicking::panic_fmt::h92e73467e6a2c091 (throughput_basic-716ee0bc7dd8261c.wasm[2894]:0x1eb224)
at core::result::unwrap_failed::h03dbf875e0072fe2 (throughput_basic-716ee0bc7dd8261c.wasm[2921]:0x1ec477)
at rayon_core::registry::global_registry::hb1f293f2c8dfd05b (throughput_basic-716ee0bc7dd8261c.wasm[1292]:0x12d0e8)
at rayon_core::current_num_threads::h35bc7a6b8dbf8c63 (throughput_basic-716ee0bc7dd8261c.wasm[1303]:0x12e058)
at rayon::iter::plumbing::bridge::ha8b8b23fcd9b38a6 (throughput_basic-716ee0bc7dd8261c.wasm[1021]:0x1058ae)
at criterion::analysis::estimates::h6521ee9def717bc3 (throughput_basic-716ee0bc7dd8261c.wasm[659]:0xb4089)
at criterion::analysis::common::h567d2c8475d0f44a (throughput_basic-716ee0bc7dd8261c.wasm[244]:0x22eb7)
at criterion::benchmark_group::BenchmarkGroup<M>::bench_with_input::hb0a4d9a4d0aa4552 (throughput_basic-716ee0bc7dd8261c.wasm[254]:0x258b5)
at simdutf8_bench::bench_input::hd027083377a798b4 (throughput_basic-716ee0bc7dd8261c.wasm[115]:0x11047)
at simdutf8_bench::criterion_benchmark::hf2c8a1c63f585fec (throughput_basic-716ee0bc7dd8261c.wasm[116]:0x1143d)
at throughput_basic::main::h9dc93b5f5255a651 (throughput_basic-716ee0bc7dd8261c.wasm[245]:0x24030)
at std::sys_common::backtrace::__rust_begin_short_backtrace::h2c5b7aaf52faf128 (throughput_basic-716ee0bc7dd8261c.wasm[127]:0x1234b)
at std::rt::lang_start::{{closure}}::h557e866ad9f65275 (throughput_basic-716ee0bc7dd8261c.wasm[147]:0x13ecd)
at std::rt::lang_start_internal::h44883fd0d3691d00 (throughput_basic-716ee0bc7dd8261c.wasm[2697]:0x1db731)
at __original_main (throughput_basic-716ee0bc7dd8261c.wasm[246]:0x240dd)
at _start (throughput_basic-716ee0bc7dd8261c.wasm[20]:0x18e8)
at _start.command_export (throughput_basic-716ee0bc7dd8261c.wasm[3016]:0x1f37d1)
FWIW, The CI failure appears to be transient, I pushed to main
on my fork and it ran okay.
https://github.com/almann/simdutf8/actions/runs/1674697191
Looking at the error--there was a problem downloading Wasmer, which didn't fail that step (but failed when attempting to run the tests as the runtime was missing):
(node:1899) UnhandledPromiseRejectionWarning: RequestError: connect ECONNREFUSED 52.204.121.99:443
at ClientRequest.<anonymous> (/Users/runner/work/_actions/wasmerio/setup-wasmer/v1/dist/index.js:1:104424)
at Object.onceWrapper (events.js:300:26)
at ClientRequest.emit (events.js:215:7)
at ClientRequest.e.emit (/Users/runner/work/_actions/wasmerio/setup-wasmer/v1/dist/index.js:1:38640)
at TLSSocket.socketErrorListener (_http_client.js:406:9)
at TLSSocket.emit (events.js:210:5)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:80:21)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1129:14)
(node:1899) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:1899) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
There are some TODOs (see below), but the implementation and test suite integration has been done.
wasm32_simd128
consistent withaarch64_neon
as there is no auto-detection for SIMD in WASM.README.md
with WASM build instructions.should_panic
test case will be ignored due to limitations of the test driver withwasm32-wasi
.TODO
wasm32-wasi
for the benchmarks have issues with bothwasmer
andwasmtime
. I suspect this is partially because of criterion expectingwasm32
targets to run to havewasm-bindgen
. One option would be to compile a shim of the library as astaticlib
crate targetingwasm32-unknown-unknown
and usingwasmer
orwasmtime
to compile/link the function and benchmark it in the current framework.Implementation for #36