wasmerio / wasmer

🚀 The leading Wasm Runtime supporting WASIX, WASI and Emscripten
https://wasmer.io
MIT License
18.28k stars 775 forks source link

Performance Issue in the fd_write Implementation #3784

Open hungryzzz opened 1 year ago

hungryzzz commented 1 year ago

Summary

Hi, I run the following case in different Wasm runtimes(after being compiled by Emscripten), and I find some performance differences between wasmer and other 3 runtimes: the execution time(collected by perf-tool, probe begins when starting to execute the wasm code(inner_module_run in wasmer) and end in sched:sched_process_exit) in wasmer is 4x slower than which in wasmtime.

#include <stdio.h>
#include <sys/time.h>

typedef struct timeval timeval;
timeval tv;

static void repeat(int count) {
  int len = 50;
  do {
    gettimeofday(&tv, NULL);
    count -= len;
    printf("%d\n", tv.tv_usec);
  } while (count >= 0);
}

int main() {
  repeat(500000);
  return 0;
}

Hardware & OS

Emscripten

Wasm runtime version

Additional details

I find that if I comment the IO(printf) in source code, the execution time of wasmer will be faster than wasmtime, so I try to replace the fd_write function to an empty function in the wasm binary just as following.

截屏2023-04-19 15 22 40

The execution times are totally changed. Because there is an another wasi call (clock_time_get) in this loop, I think the bug doesn't exist in the mechanism of calling a wasi call.

Then, I try to comment the following code(change the written size to a fix value according to my case) and rebuild wasmer, the execution time of the original case is 8675.32 us. https://github.com/wasmerio/wasmer/blob/8a343d0f9a1d3d4b4bfe622ed345253d780a3d76/lib/wasi/src/syscalls/wasi/fd_write.rs#L132-L147

Therefore, I think maybe there are some performance bugs in the implementation of fd_write, especially in the above code.

ptitSeb commented 1 year ago

Thank you for this detailled ticket. Interresting findings. Can you please try again with the just released 3.2.0 version of wasmer? Also, it would be interresting to see if building the same program with the wasi sdk instead of emscripten change performances somehow.

hungryzzz commented 1 year ago

Hi, I try again in the 3.2.0 version of wasmer and the 3 execution times are as follow:

Almost the same result as the above report.

ptitSeb commented 1 year ago

Ok, so no changes basicaly.

hungryzzz commented 1 year ago

Yes, and I compile the same program using wasi-sdk-20.0, the result is still the same.

ptitSeb commented 1 year ago

ok, thanks for all those tests. So, it's slow access to stdout/stderr? Strange. @john-sharratt is that a normal behaviour?

theduke commented 1 year ago

We don't have a fast path for vectorized IO in the VirtualFile implementation, which might be part of the slowness.

But it's more likely that the asyncify wrapper and the async runtime are a significant contributor.

It would probably be helpful to run a stress test with cargo-flamegraph to get some insight.

ptitSeb commented 1 year ago

I'm not so sure, because the slowiness was there with 3.2.0-alpha.1, and I don't think async with present at this time.

stale[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

Feel free to reopen the issue if it has been closed by mistake.