Support custom storage proof from the relay-chain

bkchr commented 3 years ago

The collator is already suitable to generate a storage proof of some specific keys and pass it to the runtime as inherent: https://github.com/paritytech/cumulus/blob/master/collator/src/lib.rs#L203

Currently this is mainly used to pass information about the state of message passing into the parachain. However, we want to open this for parachains to read whatever storage item they are interested in. This would enable parachains to read every item of the relay chain, for example the BABE randomness to use this in their runtime.

The foundation is there, we can read a given key and put it into a storage proof that is passed to the runtime. The following is still missing:

Make it more simple to access the storage proof from the runtime side. We probably want some functionality in cumulus-system to signal that the storage proof is now available (it is passed as inherent, so it comes after on_initialize and some way to read values from this proof.
Open up the client side to parachain developers to pass keys they are interested in into the collator. From the beginning we should make sure that this list isn't static, as I would assume that we will may require different keys for different blocks (as you can not just put the full state into a storage poof :D).
We need a way to make it relative easy to get storage keys for certain storage values in the relay chain state. Currently we have hard-coded these keys. There was also the idea to use well_known_keys for this, but this would not work as we can not foresee from the relay chain side which items will be required by the parachain. We also need a solution that enables us to change some storage key/move a storage item without waiting X months until the full ecosystem is moved (just assume we want to change some key in staking in Substrate).

Number 3. will be clearly the hardest part of this issue. All these issues are solvable on its own.

For 3. I could imagine that we do the storage key collection in some wasm blob. This wasm-blob would be build against the current relay-chain runtime blob. It would only be required to expose one function to collect the keys and return it as a Vec<Vec<u8>>. Inside this blob we would have access to all storage items, of all pallets being used by the relay chain runtime (This would require that we change the pallet macros to expose the storage keys of all storage items of a pallet). If we would have such a WASM blob and there is a pending relay-chain runtime upgrade, it would still require some manual intervention as all parachains would need to update their WASM blob and issue an update of this (a hash of this WASM blob would probably be stored on chain and fetched from "somewhere" or it would be stored directly in the state :see_no_evil: like the runtime wasm blob). After all parachains have done this/we waited long enough, the council could issue the runtime upgrade. This requires quite a lot of coordination, but is way faster than doing it in some more hard-coded way where we would need to wait until all nodes of all parachains are upgraded.

pepyakin commented 3 years ago

I remember a discussion between me, @rphmeier and you, @bkchr, where we thought about introducing a special host function that would provide the contents of the given key of the relay-chain storage of the relay parent. A rough idea was something like this:

During the block authoring it would go directly into the polkadot storage backend collecting the witness. The aggregated witness would go into the PoV.
To make substrate block import work we should attach these proofs to the block
During PVF execution this hypothetical function will be replaced by one that looks the data directly from PoV.

The advantages of that approach is that the keys are available for querying at on_initialize already and there is no need to expose the client for fiddling to the chain writers.

As per number 3. I agree that this is the most annoying one. I also had some thoughts about it and they also involved a wasm blob. In my thoughts it was a bit different angle though. Roughly It goes like this.

The relay-chain does not provide any guarantees on the storage layout. This applies to both keys and the structures.
The relay-chain publishes a wasm blob. This blob defines a set of functions that answer some predefined queries. These queries could be like "what is the MQC head of this para?" or "what is the balance of this account?". That's a public API.
These APIs are really serious about providing backwards compatibility. They should not break or change semantics in any way. I see it purely additive. Removing APIs (e.g. when we sunset a significant part of the system) should happen with a significant heads up delay and most likely should involve the governance.
Adding a new API should be approved by the governance.
Relay-chain upgrades should be coupled with the corresponding upgrade of the wasm blob that hosts these API, if needed.
Under the hood these API may or may not be implemented with the mechanism outlined in the beginning of the post. From an PDK side, this should ideally look like as a simple call to a host function.

This way the relay-chain controls explicitly what it exposes and can move forward independently. The downside is that there is delay involved when negotiating with the governance. However, I think that it is a good thing: I think it's better that each dependency on the relay-chain structure is explicit and each particular case is carefully assessed with respect to the long-term maintenance burden. In contrast, just opening up any storage storage a parachain can reach is just a recipe for a long-term maintenance nightmare.

bkchr commented 3 years ago

I remember a discussion between me, @rphmeier and you, @bkchr, where we thought about introducing a special host function that would provide the contents of the given key of the relay-chain storage of the relay parent. A rough idea was something like this:
* During the block authoring it would go directly into the polkadot storage backend collecting the witness. The aggregated witness would go into the PoV.

* To make substrate block import work we should attach these proofs to the block

* During PVF execution this hypothetical function will be replaced by one that looks the data directly from PoV.

Totally forgot about this one! Still a good idea :)

As per number 3. I agree that this is the most annoying one. I also had some thoughts about it and they also involved a wasm blob. In my thoughts it was a bit different angle though. Roughly It goes like this.

* The relay-chain does not provide any guarantees on the storage layout. This applies to both keys and the structures.

* The relay-chain publishes a wasm blob. This blob defines a set of functions that answer some predefined queries. These queries could be like "what is the MQC head of this para?" or "what is the balance of this account?". That's a public API.

* These APIs are really serious about providing backwards compatibility. They should not break or change semantics in any way. I see it purely additive. Removing APIs (e.g. when we sunset a significant part of the system) should happen with a significant heads up delay and most likely should involve the governance.

* Adding a new API should be approved by the governance.

* Relay-chain upgrades should be coupled with the corresponding upgrade of the wasm blob that hosts these API, if needed.

* Under the hood these API may or may not be implemented with the mechanism outlined in the beginning of the post. From an PDK side, this should ideally look like as a simple call to a host function.

I like this idea! Would this extra wasm blob be dynamically linked to the parachain runtime? So that the parachain can call one of these functions?

pepyakin commented 3 years ago

That's a good question, I wasn't thinking through the details. I think it's a possibility, although the Rust/Wasm dynamic linking story is a bit fuzzy though. But whatever is easy to use and performant enough.

rphmeier commented 3 years ago

How often can the wasm blob change? We will have to be careful that it is still easy to execute parablocks after-the-fact even when state pruned. This wasm blob could be in the AvailableData but that would be very redundant.

pepyakin commented 3 years ago

Store it in the state? I think this shim wasm blob should be way slimmer than the runtime itself.

burdges commented 3 years ago

If I understand, this moves us much of the way to contextual execution too. We could also announce these more than 48 hours in advance, which helps address their availability problem.

olanod commented 3 years ago

Is this only about collators reading the state of the relay chain? Could this be related or is it a complete different beast to read storage from other parachains? https://github.com/paritytech/xcm-format/issues/16

pepyakin commented 3 years ago

Well, yes, this is about reading of the relay chain state. I think this is at least somewhat related to reading the storage of other parachains. Reading the storage of other parachains is very specific on the parachain itself, but at least one part is shared: you typically need to get some sort of commitment like a merkle root, and that should be stored in a head data of the parachain which is stored in the relay chain state.

rphmeier commented 2 years ago

During the block authoring it would go directly into the polkadot storage backend collecting the witness. The aggregated witness would go into the PoV.

To make substrate block import work we should attach these proofs to the block

During PVF execution this hypothetical function will be replaced by one that looks the data directly from PoV.

Step 2 is definitely the complicating factor. I think it would be better to try to do this entirely on the runtime side. What we could do is set up the runtime so that execute_block will read the data from the block directly (in the form of an inherent) and finalize_block generates the inherent. This will avoid any changes in Substrate's block import code, which is already messy enough.

The good thing about this approach is that PVF execution and regular substrate import follows the exact same code-path (no relay chain state required) and only block authorship needs the host function to actually query data from the relay-chain state.

In the beginning of execute_block, we would check that the last extrinsic is a special RelayChainTrieNodes(Vec<Vec<u8>>) and replace the host function using those nodes as the backing store. We'd also remove the extrinsic from the extrinsics list before proceeding.

In the beginning of initialize_block (when called directly for block building) we would replace the host function with an implementation that actually queries the relay-chain state.

At the end of finalize_block we'd produce the transaction that contains all read trie nodes again.

This will require some modification of the executive to pull off. I am also not sure what will happen with other runtime APIs that invoke initialize_block, but it should not matter as long as they don't actually invoke the host function.

bkchr commented 2 years ago

As long as we have the native runtime, this will not work. Host function overriding is currently not supported on the native side. But yeah, we want to get rid of the native runtime anyway. The idea in general is nice. However, I don't know why you want to override something in on_initialize?

Something that I don't like is that you could not read data in on_finalize, because you don't know if your pallet on_finalize is called before or after the creation of the RelayTrieNodes transaction. Another problem is that the runtime can not add transactions to the block currently.

Polkadot-Forum commented 1 year ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/state-proof-based-parachain-parachain-messaging/2214/3

seunlanlege commented 1 year ago

This would be solved in ISMP by: https://github.com/polytope-labs/substrate-ismp/pull/62

bkchr commented 1 year ago

This will not solve this issue. What you are having there is an async way of requesting data.

seunlanlege commented 1 year ago

What's wrong with async?

bkchr commented 1 year ago

That it doesn't solve issues that are currently being solved by the proof inside inherent.

seunlanlege commented 1 year ago

That it doesn't solve issues that are currently being solved by the proof inside inherent.

Not sure what you mean here, ISMP uses inherents to respond to GET requests. If you're envisioning a solution that leverages inherents, then your design is async as well.

paritytech / polkadot-sdk

Support custom storage proof from the relay-chain #82