davidmdm commented 7 months ago

Is your feature request related to a problem? Please describe. When running large wasm files like the ones generated by the Go Toolchain (that embed the goruntime), it now takes 5-6 times longer on my machine to compile a module.

For some program I have to compile and execute it takes:

v1.6.0 ~= 4s
v1.7.0 ~= 30s+

Although I can expect the executables produced by v1.7.0 to be much more optimized and efficient, this tradeoff is not worth it for programs that want to interpret one-off wasm programs.

Describe the solution you'd like Ideally when compiling a module as part of the runtime config, I should be able to choose an optimization level to choose between fast compilation with slow performance vs slow compilation with fast performance.

ncruces commented 7 months ago

wazero is maintained by a small team, so when the new compiler was introduced, it was decided to renove the old compiler (which was a totally different code base).

The new compiler is more modular, so it may be possible to disable certain optimization passes (I'll leave it to @mathetake to comment on that). It's also a recent codebase, and there might be some opportunity to optimize it. Having said that, it's probably unrealistic to expect it to become as fast as the previous compiler was.

You have two mitigation strategies at your disposal:

cache the compilation result
use the interpreter

Other than that, if you can find (or fix!) a bottleneck in the compiler (pprof is highly recommended), we're enthusiastic about any improvements.

davidmdm commented 7 months ago

Thanks for your response. Firstly I want to say how much I appreciate the wazero project, and understand the limitations it is under and think y'all are doing a fantastic job.

Hopefully, given the modularity of the new compiler, this feature could be feasible.

Opening this issue not as a bug but just as a mark of interest in this aspect of the compiler.

Given the varied use cases of wasm, I hope in the future wazero can provide an option for use cases that prefer quick compilation over quick runtime performance.

I will experiment with the interpreter and report back with hard numbers later but I think the interpreter seemed to take as much time as the v1.7.0 compiler.

If the interpreter can work for fast startup then this is fine for me!

davidmdm commented 6 months ago

As promised here are my findings running against a 65Mb wasm file on my Macbook Air M2 (Arm64): (these results include compiling and executing the wasm - i suspect execution speed is negligible)

v1.6.0 compiler   : 2.49s
v1.6.0 interpreter: 1.866s

v1.7.0 compiler   : N/A
v1.7.0 interpreter: 2.087s

v1.7.1 compiler   : 26.134s
v1.7.1 interpreter: 1.967s

What we can draw is that the the compiler is about one order of magnitude (10x) slower than the previous 1.6.0 compiler.

However, that being said, I was wrong when I created the issue, and must have had a misconfiguration on my end: The interpreter is only marginally, and arguably negligibly slower.

Previous to v1.7.X there was little reason to use the interpreter over the compiler except for supporting more architectures. Now I think it would be reasonable to add to the documentation the differences in startup time, and market the interpreter setup as the solution for programs that need fast... Well interpretation times.

This advice could be revisited if and when optimization levels become a thing.

In the meantime, I am satisfied using the interpreter.

davidmdm commented 6 months ago

Turns out the interpreter does not scale very well. For programs that I can compile and run quickly with v1.6.0 (~5s), with the interpreter it takes 30+ seconds on both v1.6.0 and v1.7.x of wazero.

An option similar to zig's releaseFast flag, where we could disable a lot of the optimizations and achieve closer to the compilation speed of v1.6.0 would be beneficial.

TLDR: Contrary to what I believed before, the interpreter is not a silver bullet as it does not scale for complex tasks.

mathetake commented 5 months ago

so basically, we have no resources or plan to introduce any other complexity in compiler implementation. In fact, as you can see https://github.com/tetratelabs/wazero/pull/2214, there's really plenty of rooms for making the current compiler faster. You can try and see where is the bottleneck of compilation, and if you can contribute as well. At least, we should be able to make our current compiler as fast as wasmtime in terms of compilation performance (not runtime perf!).

Given that, I am changing the title of this issue to something like optimizing compilation perf

mathetake commented 5 months ago

2226

davidmdm commented 5 months ago

Love the amount of PRs and energy going into this and wanted to drop my appreciation here. 🚀 🚀 🚀

mathetake commented 5 months ago

@davidmdm mind trying out the main branch and sharing the result with us when you get a chance? 🙏

davidmdm commented 5 months ago

Absolutely! Here are the results from running time wazero compile binary.wasm

v1.6.0 -> 2.656s
v1.7.2 -> 28.759
main   -> 17.45s

If it would help to profile the application, the wasm program I am using is publicly available and can be downloaded here as a gzip.

It is essentially a program that embeds the ArgoCD Helm Chart, executes it, and performs a couple patches to some internal resources before spitting it back out again.

So some characteristics:

embeds large assets
uses a lot of marshalling/unmarshalling

Great work! The next wazero release will be at least twice as fast as v1.7.2!

EDIT: adding wasmtime as reference for the same wasm binary:

wasmtime -> 5.917s

mathetake commented 5 months ago

let's keep this open until the perf becomes comparable to wasmtime. Thank you for the testing @davidmdm !

mathetake commented 5 months ago

oh wait, I remember wasmtime does a parallel compilation (using multiple workers to compile multiple functions simultaneously) vs wazero compilation runs in a single thread/goroutine. I wonder if it's possible for wasmtime to compile in a single thread

davidmdm commented 5 months ago

Perhaps there are opportunities to make certain parts of the wazero compiler concurrent? If so that may bridge the gap considerably!

mathetake commented 5 months ago

yeah that's one thing we should consider, but for now I think I would like to focus on the single thread perf and then we can return to the parallelization (which I think shouldn't be that hard)

ncruces commented 5 months ago

If we're going for that kind of sophistication (compile functions in parallel), I wonder if we could also compile functions incrementally (on first use).

inliquid commented 5 months ago

compile functions incrementally (on first use).

Sounds like introduction of JIT, or do I miss something?

ncruces commented 5 months ago

The difference between a JIT and AOT is blurred, IMO.

But if we can compile functions one-by-one in parallel, maybe we can compile functions one-by-one on first use?

To not make things much harder, I guess we'd need to know the call dependency tree (compile all functions a certain function can possibly call). But maybe that's not useful since with indirect calls those may all functions in the Wasm?

TBH, I don't know.

mathetake commented 4 months ago

now with the current main branch, it seems like 14s~15s sec (previously 30s+) to compile your binary on my local machine @davidmdm 😎

davidmdm commented 4 months ago

@mathetake Oh I know! I am not saying much but I promise you that I am lurking and doing little victory dances every time I see a PR shave off another second come through!

I ran it myself and also got ~14 seconds! We(you!)'ve crossed the >50% improvement since v1.7.0 🎉

Thanks so much for this effort.

davidmdm commented 4 months ago

Dropping another ❤️.

My binary, which I realized doesn't need to do anything other than import k8's client-go package to take 30+ seconds to compile in 1.7.2 is now approximately 13 seconds on master.

❤️ ❤️ 🚀 🚀 🎸 🎸 🥳 🥳

evacchi commented 4 months ago

Awesome!!! @mathetake the real mvp 😎

davidmdm commented 1 month ago

Hello wazero team!

I am wondering what the status of this effort is? If there are still plans to try and make the wazero compiler concurrent, etc.

Totally understand the constraints the team is under, and do not assume for a second that this takes priority over all other things.

For my project I am still using 1.6.0 since 1.8.x is still 5x slower. However I would love to update. Just want to gauge/feel out the timeline. I understand if one cannot be provided.

mathetake commented 1 month ago

Unfortunately, we don't have any cycles to dedicate to wazero at all at the moment😞

tetratelabs / wazero

Make compilation faster #2182

2226