Open koute opened 2 years ago
Now, we could probably just optimize this on wasmi's side so that it doesn't preallocate and clear 1MB buffer on each invocation (assuming this hasn't been done already; we're still using quite an old version of wasmi, and I haven't checked the newest version).
The newer wasmi
versions allow to set the initial and maximum value stack length in the Config
similar to how Wasmtime Config
works. Therefore this wasmi
specific problem should be resolved once we upgrade to wasmi
0.16.0
or above.
Add support for the bulk memory ops to wasmi, wasmi-validation and wasm-instrument, if it hasn't been done yet.
We probably do not need support for Wasm bulk-memory
proposal in wasmi
since this is only relevant if we want to keep wasmi
as a Substrate runtime execution engine but I honestly see no reason why we should keep it. In the very past it was useful when Wasmtime was not as stable as it is nowadays (or maybe some other reason). Although support for Wasm bulk-memory
proposal is planned for wasmi
since certain smart contracts could potentially benefit from it.
Citing the parent issue:
- It would be great to have data on how exactly performance is improved. This would help to evaluate how much priority it has for implementation and for the runtime writers user for upgrading.
- wasmi does not support pretty much most of the newest features. wasmtime is the primary engine for now, but it would still be good to have the second engine.
- The polkadot side should be taken into account. How would PVF execution migrated? Or those features won't hit PVF?
It's clear about the pt. 1. Re pt. 2, I think either way is fine. (UPD: Robin beat me up to it, and I agree, I think getting rid of wasmi executor is on the table)
The PVF is a bit more complicated.
We need to consider that if we unconditionally enable wasm_bulk_memory
it will be enabled for PVFs. There are two problems with that:
There is a problem with the upgrade. If we just YOLO enable it, an adversary could take advantage of that. Unfortunately, until https://github.com/paritytech/polkadot-sdk/issues/917 is landed our hands are tied.
That implies that the executor configuration should allow disabling bulk mem ops. It will be disabled for PVF execution.
Since the Cumulus PDK uses the same binary for the Runtime and PVF, the parachains won't be able to take advantage of bulk mem ops. Parachains are the overwhelming majority of the users of Substrate, so the impact would be limited until we upgrade PVF.
Then, the blocker for enabling bulk mem ops for PVFs is the question about the metering. It is likely coming at least for PVFs. That means we have to squash this concern touched on in the parent issue.
(UPD: Robin beat me up to it, and I agree, I think getting rid of wasmi executor is on the table)
For the foreseeable future we will run contracts with an in-runtime wasmi.
Tangential? The wasmi executor refers to sc-executor-wasmi and not the sandbox backend.
Should be sanity checked with something outside of pallet benchmarks, but the seal_return_per_kb
sound great.
Probably historic import times, there is benchmark block
to measure re-import times of old blocks.
I will do so once you put up an MR :smile:
(This is a subissue of https://github.com/paritytech/substrate/issues/10707; I'm creating a new issue to focus on just the bulk memory operations.)
We should seriously consider enabling the bulk memory operations feature in our WASM executor and our runtimes. We have recently discovered that the contracts' benchmarks currently can spend up to 75% of their time within the WASM calling
memset
; not only that, sincewasmtime
doesn't cache-align loops depending on how the instructions are laid out in memory the performance ofmemset
/memcpy
/etc. can very widely vary, and can become up to ~40% slower when compared to the cache-aligned case.I've done a quick test to compare the performance of the
pallet_contracts/seal_return_per_kb
benchmark as it currently stands today; here's its execution time in each case:Not only does it cut down the benchmark's execution time by half, but also should prevent
wasmtime
's codegen roulette from regressing the performance.Now, we could probably just optimize this on
wasmi
's side so that it doesn't preallocate and clear 1MB buffer on each invocation (assuming this hasn't been done already; we're still using quite an old version ofwasmi
, and I haven't checked the newest version). Nevertheless I think this has demonstrated that there's concrete value in enabling this extension - even ifwasmi
is fixed something else could conceivably allocate large buffers and tank the performance, and then we'll be back to square one; there's also potential for this extension to speed things up in general, considering how widelymemset
/memcpy
are used under the hood. I don't think it's worth holding back on this extension anymore, and we should just pull the trigger and enable it. In the worst case it won't make any difference, in the best case it can significantly speed things up.What needs to be done? (high level plan)
wasmi
,wasmi-validation
andwasm-instrument
, if it hasn't been done yet.bulk
feature on theparity-wasm
.config.wasm_bulk_memory(true)
when initializing wasmtime.-C target-feature=+bulk-memory
.polkadot
.-C target-feature=+bulk-memory
flag when building runtimes.Anything else I'm missing?
cc @pepyakin @athei @Robbepop