near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 619 forks source link

Idea: ponder making wasmtime the primary backend #11660

Open nagisa opened 3 months ago

nagisa commented 3 months ago

We have been thinking, that the focus will soon shift back to the quality of the codegen, especially once memtries are enabled and the performance improvement of doing so is evaluated in https://github.com/near/nearcore/issues/10877.

If we lay out the three well-known (to us) backends: singlepass we currently use now; cranelift; and LLVM, the change in each step is expected to be an integer multiple in generated code throughput, similar to what we have seen 2-3 years ago.

The most notable downside to cranelift and especially LLVM so far has been the fact that their compilation pipeline does not have a predictable linear bound on their execution time. Since then wasmtime has actually been improved in this aspect quite a bit, and with the rewrite of their register allocation to regalloc2, it should be closer to linear than to quadratic. So it wouldn't be a terrible option as a production backend after some changes.

Another reason to not worry about compilation time as much is the upcoming pipelined compilation and contract preparation in https://github.com/near/nearcore/issues/11319 which will, to an extent, reduce the impact of non-linearity in compilation flow. Though most likely not enough to allow us to consider the super-super-linear LLVM as a backend.

Today's implementation of wasmtime as of https://github.com/near/nearcore/pull/11532 already has all the infrastructure on-par with our production backend, including caching and such. This is a recent change, and should greatly improve on the previously perceived huge impact on the transaction throughput we've come to expect from this backend in the past. It would be a good idea to experiment running a node with wasmtime (with or without memtries) to see how it performs in practice or toy test cases.

Then identify what remaining problems are there (if any) and if it would be worthwhile to expend effort to resolve those problems.


I anticipate that the performance of wasmtime might still not actually be all that great due to the fact that gas accounting today is going to be implemented in terms of host function calls, rather than intrinsics, but then wasmtime also has a native fuel mechanism we could explore (and/or we could implement gas accounting via e.g. imported globals and such.)