paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.65k stars 589 forks source link

Wasm Contracts: Implement Code Merkleization #122

Open athei opened 2 years ago

athei commented 2 years ago

Glossary

Motivation

In the current pallet-contracts implementation the whole wasm binary of a contract needs to be loaded from storage into memory before it can be executed. Therefore any call transaction triggers all the code to be included in the proof of value when executed on a parachain. This is because right now the merkle proof that is sent from the collator to the validator is automatically recorded form the accessed storage items.

One way of reducing the PoV size we want to explore is to opt out of this automatic recording for the (big) storage item that contains the contract code and instead provide a custom merkle proof containing only the accessed parts of the contract.

Description

We leave the actual storage as-is (contract stays in a single storage item) but record which parts of a contract are executed when running a contract call in the substrate runtime (as opposed to PVF). This information can then be used to stitch together a partial wasm module in the PVF (run by the validator). Note that this partial wasm module would be constructed in a way that it can be executed by any execution engine.

The code would be merkelized on deploy time by the substrate runtime according to our chunking strategy (one chunk per function to keep it simple for now). The root of that merkle tree would be put in regular storage and a proof against that root is put into the PoV for every call of that contract. The code is still stored as a single storage item. The merkle tree is only for the proof that is received by the PVF.

This approach has several advantages:

However, this can't be implemented with the current state of substrate/cumulus because any storage access of the substrate runtime is recorded and put into the PoV. One idea to resolve that is to give the runtime logic more control over the storage contents of the PoV: We think that we need exclude the access to the original monolithic wasm code entry from the automatically generated storage proof and instead include our custom proof. In order to do so cumulus would need to be modified to allow its users (the actual runtime, the pallets, etc...) to put custom data into the ParachainBlockData (which is essentially the PoV).

This is a departure from the current design where the runtime logic is oblivious to the fact that it is running as a para chain. This might sound bad but it will enable use cases where pallets can provide data much more efficiently to the validators:

Progress

athei commented 2 years ago

I had a talk with @bkchr and we came to the conclusion that this should be possible. However, what was less clear is how that would be implemented in terms of APIs. I gave this some thought and wrote down a proposal on how this could be integrated into substrate. It also has a usage example which shows how I intend to use this from pallet_contracts.

Please make sure to also read the doc comments where I put down some additional thoughts: https://gist.github.com/athei/5df72bc02c44f342338fdb66b2269619

athei commented 2 years ago

After a chat with @gavofyork it became clear that runtime code should never have the power to introduce consensus errors. Allowing the runtime to include custom data into the PoV would introduce this new class of errors the the runtime.

Instead, we should come up with a data structure implemented by the client that achieves our goal of not including all the data accessed by the collator into the PoV.

athei commented 1 year ago

Following up on my last comment: It is true that a collator can produce a block that is valid from its own point of view but not from the point of view of the PvF. This is one of Polkadot's main benefits: You don't need to put a lot of economic value behind collators when launching a chain.

That said, this is generally not possible while using Cumulus (modulo bugs in its implementation). The point of using it is to make the split between block production and block validation transparent to the runtime author. Or in other words: While writing your runtime you get the PvF for free. They are symmetrical. That is the whole point of Cumulus.

Code merkelization would require the introduction of asymmetry. In order to implement it we would need to do one of the following:

1) Modify Cumulus to allow for asymmetry. For example, allowing to add custom data to the PoV in block production and processing this data while validation. Right now differences between production and validation is handled transparently by the client. 2) Abstract that asymmetry away through host functions (doubling down on the Cumulus approach). 3) Don't use FRAME nor Cumulus for pallet-contracts but build directly on top of the Polkadot protocol.

3) is a non starter because integration into FRAME is one of the big selling points of pallet-contracts. That leaves as with 1) and 2). My stance on this is that if we can accomplish a feature in user space (runtime) we should do that. As adding new host function will add an additional burden for each and every new client implementation.

The downside of 1) is that it introduces a potential foot gun into FRAME + Cumulus: Users might use the feature in a wrong way and may produce invalid blocks by accident. However, I think this is largely okay and doesn't impact the overall FRAME developer experience:

1) It will be an isolated and opt-in feature only to be used when necessary. We will offer proper abstractions in FRAME/Cumulus for it that limits the asymmetric code to well defined blocks (akin to Rust's unsafe). 2) While potentially buggy the user written PvF code will still be deterministic over its inputs because this is how we will design our abstractions. 3) It will open the door for experimentation with other asymmetric solutions on top of it (like zero knowledge proofs). If we don't allow for asymmetry on top of FRAME this requires writing a runtime from scratch or forking Cumulus. This is hard and builders that want to do something like this might turn elsewhere. It is great that the Polkadot protocol is flexible enough to allow for all kinds of parachains. However, the hurdles to actually leverage that are quite high as of today. Allowing to do that with FRAME would make this drastically more approachable.

Blocked

Moving into blocked because more design discussion is needed. Also, there are alternatives for dealing with code size problems. Maybe we don't even need merkelization: