paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Improve the throughput of wasm contracts on parachains #9354

Closed athei closed 1 year ago

athei commented 3 years ago

Improve the throughput of wasm contracts on parachains

We want to improve the throughput that can be achieved when pallet_contracts is run on a parachain. It is a cross team long term effort and this is the meta issue to coordinate said effort.

Problem statement

More complex ink! generated smart contracts are big in Wasm code size right now which leads to parachain transaction throughput to be limited by bandwidth rather than computational complexity. Read on to learn why that is.

Parachain Validation

During parachain development it came to light that the transaction throughput will be bottle necked by validator bandwidth rather than computation power (block weight limit). In order to understand why this is we need to look into how parachain validation works:

All blocks submitted for validation by the parachain collators are validated by a shared pool of relay chain validators that are not specific to any parachain. Therefore those do not hold any state (storage, database) of any parachain connected to the relay chain. In order to allow them to validate a block without any state (of the parachain) the collators are required to send the values of all storage items accessed in the block that is submitted for validation together with a merkle proof (witness). We call the entity that contains the said information (among other things) the PoV (Proof of Value). Validators are effectively functioning like light clients with regard to the parachains.

This means that every storage item accessed in a block must be distributed as part of the PoV to the assigned validators via network every time the item accessed for the first time in a block. The parchain team currently targets a size of 2-4 MiB for the PoV.

One can easily see that transaction which access a large amount of storage (in bytes) are expensive to execute in this scheme because they make us reach the PoV size limit much quicker than the block weight limit.

Smart Contracts

With the above knowledge we can now understand why this is especially bad for smart contracts: The contract itself (the wasm blob) is a storage item and needs to included in the PoV (and therefore send via network) whenever it is called so that validators are able to execute the contract. In a standalone chain the binary only needs to be send via network when it is uploaded which is a much rarer occurrence.

We can do some napkin math to determine that this in fact hinders our throughput: The ink! ERC20 example compiles to a 32 KiB wasm binary. Using our 4 MiB PoV limit we can only fit 128 transactions in a block which is a transaction speed of 21 tps assuming a 6 second block time. This limit applies no matter how cheap the called message inside the contract is to execute because the whole contract needs to be pulled from storage and hence included in the PoV.

The ERC20 contract is considered a rather small to medium sized contract. However, in our example we assume 128 unique contracts being called. That is the worst case. The size limit on Ethereum for EVM contracts is 24 KiB. So wasm contracts are at a clear disadvantage right now in terms of size. Using pallet_evm will most likely yield a higher throughput (than pallet_contracts) just because the contracts are smaller. The Open Zeppelin ERC20 contract compiles down to 3 KiB EVM byte code. This is not a fair comparison, though: The ink! version does more things since ink! tries to play well in a world with storage fees but it shows how much bigger ink! contracts are right now.

What about solang?

Solang can compile solidity to wasm contracts executable by pallet_contracts. It is therefore an attractive tool to measure how much of the additional size is because of wasm and how much because of Rust (ink!). We compiled the Open Zeppelin ERC20 contract with solc and solang and compressed it with zstd. This is the result:

solidity/ERC20.raw : 51.77%   (3214 =>   1664 bytes ) 
solang/ERC20.wasm  : 37.09%   (9666 =>   3585 bytes )

You can see than the increase is less dramatic but still 3x uncompressed. Part of it is because of the lower entropy (simpler instructions) of wasm code which can be observed looking at the compressibility. The rest is that solang just produces more code. It is not because of of the wasm module structure itself (types, import declaration) which contributes less than 1 % in this specific case. The reasons are unclear and could maybe attributed to solang being less optimized than solc.

Update

I ran an experiment with the solang compiler. I set the address type to 4 bytes (instead of 32) in the solang compiler source. I also removed all wide integer types from the solidity ERC20 source.

With these changes the size of the wasm compiled ERC20 contract was reduced from 3x (as mentioned above) to 1.3x relative to the EVM one (after compression). I think we can pin the size explosion to the wide integers and not to wasm itself.

Given that no arithmetic is happening on the address type that overhead is coming from merely passing this large value around (as value on the stack). So every access to this type (it is accessed a lot) generates 4 stack accesses (4x32bit). Treating the address as a value type is what we should learn from this.

The repository where this experimentation happens can be found here. It is planned to gather statistics over multiple solidity contracts with the changes regarding large data types applied. However, most contracts won't compile with the address type changed in solang. This is on halt until the issue in solang is resolved.

Proposed Solutions

Clearly there is some work to be done to increase the transaction throughput. When designing pallet_contracts several years ago we thought that the faster execution of wasm contracts gave us an edge over EVM contracts but it turned out that the more important metric is contract size when a contract is executed on a parachain. We need shift our optimization focus. Once we overcome the size issue the faster execution (using JIT) will enable much richer contracts than possible on EVM.

In the following we list all the proposed solutions which could help to increase our throughput. In addition we link all the issues that work towards accomplishing those solutions:

Reduce ink! contract size

This one is quite simple to understand: ink!, the de facto standard language to produce binaries for pallet_contracts, produces rather big artifacts. Reasons for this include but are not limited to:

Other possible reasons are rooted in Rust (rather than ink!):

We need to work hard to get this size down. This will be an incremental long term process but there are probably some low hanging fruits which will dramatically decrease size in the short term. The ink! team will analyze the currently produced contracts and bring down their sizes whether those are caused by the ink! or Rust itself (for example by preventing monomorphization with trait objects).

Progress

For an overview of what the ink! team is working have a look at their size reduction project.

Code Merkleization

Code Merkleization will be important in order to enable bigger contracts to be economically viable.

Huffman Coding

Huffman coding of instructions can yield some substantial size savings when compared to applying zstd over the whole PoV because we can tailor a fixed encoding table specifically for our use case. We should look into what can be gained there.

@0x7CFE

Teams involved

ink!

Led by @Robbepop and responsible for optimizing ink! codegen (generating Rust) with regards to code size. The ink! team is also in charge of cargo contract which plays a vital role in enforcing the correct compiler flags for optimal codegen.

Compiler

Led by @testhound. These are our experts for low level codegen (generating wasm). All things LLVM and JIT.

Contracts Pallet

Led by @athei. Sits in the middle and connects the different teams. Also implements functionality in pallet_contracts that enables further size saving in ink!.

Parachains

Led @rphmeier. Advises the other teams on how to optimize best for the parachain protocol.

burdges commented 3 years ago

You'll find numerous major optimization directions here if you look closely.

We bloat PoV blocks and CPU time by 4x in dense trees, due to using radix 16 hashing. I've told at least four people how to fix this, but they always come back saying the relay chain trees has mostly sparse subtrees. Actually dense trees pop up, especially if you're working to minimize state.

We serialize stuff crossing the WASM boundary to native. Servo avoids this by just passing references, which brings some complexity like using Spidermonkey's allocator outside JS/WASM. I'd expect calls between WASM blobs face some similar concerns.

h4x3rotab commented 3 years ago

Not sure about how the availability protocol runs, but I think another direction is to make the contract code distributed to the validators on the relay chain (just like the runtime WASM file?), and then we only need to include the hash of the contract code in the PoV block.

burdges commented 3 years ago

It's tricky..

We're scalable for transactions because we do not distribute all state or parablock data to all validators. Any pre-distributed code limits our total code size across the system, but we could simply charge much higher rent for pre-distributed code of course.

We've a similar problem with parathread code..

I've always favored parachain code being left as whole blocks, both erasure coded in the availability store and when optimal downloaded and precompiled, but never moved into state. We'd then do parathreads by making approval checkers reconstruct and build the parathread's code block, in addition to reconstructing the candidate itself. It's basically a storage chain where you could access whole recent blocks with some price.

Intuitively here, all parathreads pay not merely for their reconstructing and running their candidate, but also for reconstructing and building their code, which takes far longer than merely running the block. We'd cache builds somewhat here too but @rphmeier observed that builds turn out very costly, so if your parathreads make many blocks then pre-distributing and pre-building is optimal. We'll likely want swarms of identical parathreads and SPREE modules, which both favor pre-distributing and pre-building too.

Could contracts use this solution?

In principle, we could tweak the erasure coding and reconstruction logic to support reconstruction on 2 * num_validator byte boundaries, so likely 3 kilobyte regions, so then contract code could be referenced by a position in a previous block. We eventually prune old blocks from availability though, which parachains/parathreads could solve by reuping their code blocks. We'd need contract code to similarly rent this alternative storage space together with other contracts, which risks a tragedy of the commons.

In brief, we've trouble applying our availability tricks to code, but sometimes doing so likely does make sense.

athei commented 3 years ago

We bloat PoV blocks and CPU time by 4x in dense trees, due to using radix 16 hashing. I've told at least four people how to fix this, but they always come back saying the relay chain trees has mostly sparse subtrees. Actually dense trees pop up, especially if you're working to minimize state.

We serialize stuff crossing the WASM boundary to native. Servo avoids this by just passing references, which brings some complexity like using Spidermonkey's allocator outside JS/WASM. I'd expect calls between WASM blobs face some similar concerns.

I fail to understand how that would decrease the size of the PoV which is dominated by the wasm blob size of the contract in this scenario. Can you elaborate? Please keep in mind that I am no deep expert in the parachain code as most people working on this issue right now.

Intuitively here, all parathreads pay not merely for their reconstructing and running their candidate, but also for reconstructing and building their code, which takes far longer than merely running the block. We'd cache builds somewhat here too but @rphmeier observed that builds turn out very costly, so if your parathreads make many blocks then pre-distributing and pre-building is optimal. We'll likely want swarms of identical parathreads and SPREE modules, which both favor pre-distributing and pre-building too.

Could contracts use this solution?

Again, you assume to much knowledge about parachain details here. I don't understand what you mean. I would need some background (maybe you can link me some relevant doc or explain) in order to answer this question.

burdges commented 3 years ago

Apologies. I wondered off thinking about the problem some.. ;)

Availability could be made contract friendly, but does not provide any silver bullet. We're inherently less scalable for code than for data. Yes, we could consider mechanisms for pre-building contracts on all validators. It's ugly but it'll be fastest if the contract gets used lots. I'll caution pre-builds makes contracts compete more directly with resources used by parachains and especially parathreads.

I fail to understand how that would decrease the size of the PoV which is dominated by the wasm blob size of the contract in this scenario. Can you elaborate?

You're correct. If the WASM blob is the problem then this won't help much. Initially my comments were more general.

athei commented 3 years ago

Apologies. I wondered off thinking about the problem some.. ;)

No worries anything that helps is welcome. However, I try to keep this issue focused on the bandwith issue and not general performance.

Availability could be made contract friendly, but does not provide any silver bullet. We're inherently less scalable for code than for data. Yes, we could consider mechanisms for pre-building contracts on all validators. It's ugly but it'll be fastest if the contract gets used lots. I'll caution pre-builds makes contracts compete more directly with resources used by parachains and especially parathreads.

Moving common functionality from contracts to the runtime (maybe additionally with the help of the module linking proposal) is something that the ink! team will be looking into. Maybe that makes having "precompiled contracts" as ethereum has them obsolete. Additionally, chains could move this core functionality to a pallet and make that accessible via chain extensions (extending the host functions a contract can call).

You're correct. If the WASM blob is the problem then this won't help much. Initially my comments were more general.

While improving execution speed and hence the transaction weight is always appreciated this is not the point here. We are bottle necked on bandwith (PoV size). So speed improvements do not buy us anything right now. It is a fair point, though. It is just the wrong place to discuss it.

athei commented 3 years ago

I continue to work on analyzing where the bloat of our ink contracts stems from. I figured out a workaround that allows twiggy to open our contracts even with an intact name section. I am trying to upstream this fix. Unfortunately the project seems to be abandoned. Please install my fork for a working twiggy. If I need further improvements I will update the master there.

In order to have analyzeable contracts I added a new flag to cargo contract that allows us to keep the debug symbols in optimized builds. This allows the usage of twiggy mono to detect bloat generated by monomorphizations and makes working with the binary a more pleasant experience in general.

burdges commented 3 years ago

I just learned from https://github.com/johnthagen/min-sized-rust that some binary size reductions occur with

[profile.release]
codegen-units = 1

so maybe web3 could finance rustc work on communication between codegen units in the most relevant passes?

testhound commented 2 years ago

The compiler team is working on adding procedure outlining as a optimization to the LLVM backend. Some background on procedure outlining:

https://mnt.io/2016/12/06/reducing-code-size-using-outlining/

https://www.linaro.org/blog/reducing-code-size-with-llvm-machine-outliner-on-32-bit-arm-targets/

https://llvm.org/doxygen/MachineOutliner_8cpp_source.html

This will be paired with function merging which achieved a 8% reduction in code size.

athei commented 2 years ago
athei commented 2 years ago

Added a link to the solc vs. solang comparison repository. Please note that this effort is currently on hold until solang is changed so that it realiably allows compiling with different address types. However, the initial results with the ERC20 contract (see top post) make me confident that the size issues in wasm contracts can be overcome by passing large data types by reference.

Robbepop commented 2 years ago

ink! Update: Wasm File Sizes

After implementing

It was finally possible to properly implement the Mapping type efficiently that has first been introduced by @HCastano in:

Which I took and further optimized it using the above mentioned PRs:

Some further researches to reuse the Mapping internal index key with RefCell and UnsafeCell have not been successful but can still be inspected in ink! branches:

While implementing #979 it was discovered that emitting of events in the ERC-20 example contract was especially costly. Also we did experiments by replacing the 256-bit wide AccountId with a 32-bit one in order to emulate all AccountId uses through references that are 32-bit on Wasm in size.

By using a 32-bit AccountId we can decrease the 10.8kB down to 8.7kB of uncompressed Wasm file size. If we additionally remove all emit_event operations we get down to 6.0kB of uncompressed Wasm file size. After compression using zstd we get down from 8.7kB to 4.9kB and from 6.0kB to 2.8kB.

With both #946 and #979 merged we should look into how we can further optimize how we emit events in ink! for the largest gains with respect to our ERC-20 example contract benchmark.

Note that those 6.0 kB are not a realistic goal for now since they lack emitting of events entirely. However, I am optimistic that we can get the ERC-20 example contract down to 9.0kB or maybe even under 8.0kB of uncompressed Wasm file size.

There are still a lot of improvements that can be done in ink! with respect to Wasm file sizes:

athei commented 1 year ago

We continue to improve contract sizes and throughput but we are at a state where we are largely OK with where they are. Closing here.