Open appetrosyan opened 1 year ago
So here is flamegraph for sumeragi main loop (I am using cargo flamegraph
):
Looks like instantiate
takes a lot of time, however it turns out that profiling doesn't take wasm code into account.
Here are measurements how much time some functions take for single transaction (1600 tps load with config as here). All values are in microseconds, values for try_create_block
and categorize_transactions
are normalized (divided by block size = 20).
try_create_block 399
categorize_transactions 371
TransactionExecutor::validate 370
TypedFunc::call 328
Runtime::instantiate_module 12
So we can see that instantiate
takes less then 5% and most of the time takes actual execution of wasm code. Need to investigate how to speed up wasm code, in particular #4803
Related: #4727
related #4914
After #5048 merged, executor related things (linker initialization, module instantiation, memory free) started to take noticeable amount of flamegraph. Potentially we could get good tps improvement with persistent executor, but there are problems with its implementation related to lifetimes.
Using Linker::instantiate_pre
might help the performance without the downsides of re-using, since, AFAIU, it has all the imports resolved already and can be instantiated multiple times.
Feature request
The executor should persist between execution of different transactions. Ideally it should be brought up either after an upgrade or after a quick recovery (from a power outage). It should not have significant persistent storage (caching is fine, storing information that can affect the next verdict is not).
@mversic suspects that
wasmtime
could have an out-of-memory condition if WASM is not periodically blanked, so the suggestion is to start with a block scope and then extend it further to longer periods of time. I Suggest making it a configuration parameter.In detail we need the following:
configs/peer/config.json
that can beEvery n transactions
,every n blocks
,until peer crashes
, defaulting toevery 1 transaction
.n
ofx
entity.Motivation
The operation of loading and unloading an executor affects the performance of regular transaction processing, so it makes sense to optimise the process and avoid unnecessary loading and blanking of memory if the old memory does a good-enough job.
Who can help?
@mversic @appetrosyan