Introduce `storage_proof_size` Host Function for Improved Parachain Block Utilization

skunert commented 9 months ago

https://github.com/paritytech/polkadot-sdk/pull/1462 has been open for a while and is now well-reviewed. Opening an RFC since it introduces a new host function that is relevant for parachains and light-clients.

tomaka commented 9 months ago

I also would like to raise the concern that the algorithm used to generate a proof must now be precisely spec'ced (at least on paper).

I'm not completely sure of what I'm saying because it's been a long time since I've looked into this, but if I remember correctly Substrate sometimes generates sub-optimal proofs (i.e. some proofs include slightly too much information), for example IIRC when deleting a prefix. Adding this host function means that other host implementers must now reproduce the exact same undocumented optimizations and flaws as Substrate, but also that adding these optimizations later would be difficult as it would break backwards-compatibility.

burdges commented 9 months ago

I also would like to raise the concern that the algorithm used to generate a proof must now be precisely spec'ced (at least on paper).

Parachains are necessarily free to chose their own storage proof algorithms.

if I remember correctly Substrate sometimes generates sub-optimal proofs

Yeah, there is not necessarily a well defined storage proof size until the collator produces the PoV block, like adjusting the relay parent could change the size even if the block does not otherwise change. Any given PoV block does however have some well defined storage proof size, so this makes sense to the collator and polkadot, but not before.

tomaka commented 9 months ago

Parachains are necessarily free to chose their own storage proof algorithms.

Is the new host function only accessible to parachain runtimes when executed by parachain hosts? Or must the relay chain host also implement it? If it's the former, then this change doesn't even need an RFC, as it would be an implementation detail of parachains. If it's the latter, then there's a contradiction here.

bkchr commented 9 months ago

Is the new host function only accessible to parachain runtimes when executed by parachain hosts? Or must the relay chain host also implement it?

Only parachain hosts need to implement it. The RC isn't involved into this. However, light clients (like smoldot) will may need for re-executing blocks or similar.

skunert commented 9 months ago

However, light clients (like smoldot) will may need for re-executing blocks or similar.

This was my main motivator for opening this RFC. Even though this will not be included on the relay chain, light clients will need to have the host function.

burdges commented 9 months ago

If you run PoV blocks then you'd want this host function. Could we maybe disable this host function when not running PoV blocks though?

If smoldot runs a PoV block then yes sure it's expose this, but then it knows the size. If it runs a lone tx without full context, then it has nothing to expose here. In practice, this means validate_block can call this host function, but the parachain authors fucked up if it gets invoked elsewhere under execute_block.

Amusingly collators would not be validating one another's weight reclamation, but depending upon the relay chain for this. lol

skunert commented 9 months ago

Amusingly collators would not be validating one another's weight reclamation, but depending upon the relay chain for this. lol

What do you mean by this? If a collator would build a block reclaiming too much weight, the block would fail to import on other collators. And fail validate_block too of course.

skunert commented 9 months ago

Parachains are necessarily free to chose their own storage proof algorithms.

Is the new host function only accessible to parachain runtimes when executed by parachain hosts? Or must the relay chain host also implement it? If it's the former, then this change doesn't even need an RFC, as it would be an implementation detail of parachains. If it's the latter, then there's a contradiction here.

So yeah, maybe an RFC was not necessary after all. If nothing changes here until next week, I will close here and merge the PR.

tomaka commented 8 months ago

I said that this doesn't need an RFC because parachains are in principle free to do whatever they want here. However there's an argument to make that systems parachains should also be precisely specced.

bkchr commented 8 months ago

I said that this doesn't need an RFC because parachains are in principle free to do whatever they want here.

Yes they are, but smoldot would also need to support "whatever they are doing". We can also close this, but I think it is worth to have it here.

skunert commented 8 months ago

/rfc propose

paritytech-rfc-bot[bot] commented 8 months ago

Hey @skunert, here is a link you can use to create the referendum aiming to approve this RFC number 0043.

Instructions

1. Open the [link](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-collectives-rpc.polkadot.io#/extrinsics/decode/0x3d003e01015901000049015246435f415050524f564528303034332c62366361383864613664383837613435616463393931643638616130386532393238306562393064346165346337656163653361623864626566373833613766290100000000). 2. Switch to the `Submission` tab.

3. Adjust the transaction if needed (for example, the proposal Origin). 4. Submit the Transaction

It is based on commit hash 30fa392f64d9942a5d0e089dfc938de05de34073.

The proposed remark text is: RFC_APPROVE(0043,b6ca88da6d887a45adc991d68aa08e29280eb90d4ae4c7eace3ab8dbef783a7f).

tomaka commented 8 months ago

My concern hasn't really been addressed:

I also would like to raise the concern that the algorithm used to generate a proof must now be precisely spec'ced (at least on paper).

I'm not completely sure of what I'm saying because it's been a long time since I've looked into this, but if I remember correctly Substrate sometimes generates sub-optimal proofs (i.e. some proofs include slightly too much information), for example IIRC when deleting a prefix. Adding this host function means that other host implementers must now reproduce the exact same undocumented optimizations and flaws as Substrate, but also that adding these optimizations later would be difficult as it would break backwards-compatibility.

This change would make it very hard (hard enough that it will likely never happen) to later fix the sub-optimal proof generation and to write an alternative implementation of a system parachain collator.

bkchr commented 8 months ago

Adding this host function means that other host implementers must now reproduce the exact same undocumented optimizations and flaws as Substrate, but also that adding these optimizations later would be difficult as it would break backwards-compatibility.

The same proof generation is already used for the light client proofs. Thus, it is already "fixed" and I don't get your comment here. Or do you want to argue that you can change a networking version and give up on the old version faster?

tomaka commented 8 months ago

In some situations, proof generation can be really tricky. For example, if the runtime calls ext_storage_clear_prefix_version_1, the proof doesn't need to include the storage entries that were deleted, but if the runtime calls ext_storage_clear_prefix_version_2, then you do need to include in the proof the storage entries that were deleted, plus an additional non-deleted entry if there is any, so that the function can return whether there is any key left.

At the moment, you can generate a proof that contains unnecessary trie nodes and everything works fine. So it's fine to just include too many things in proofs "just in case", and as far as I understand (by looking at proofs while debugging smoldot) that's what Substrate sometimes does.

But if we add a host function that returns the current size of a proof, then these unnecessary entries do matter when it comes to determinism. An implementation that includes too many things in proofs might treat some blocks as valid while an implementation that has optimized its proof generation will refuse these blocks. That seems not great to me, because it locks our proof generation code to "whatever it is right now" and makes it extremely difficult to optimize it later. Plus any alternative implementation will have to exactly mimic Substrate's behavior to the smallest detail, which seems incredibly difficult to me.

skunert commented 8 months ago

At the moment, you can generate a proof that contains unnecessary trie nodes and everything works fine. So it's fine to just include too many things in proofs "just in case", and as far as I understand (by looking at proofs while debugging smoldot) that's what Substrate sometimes does.

Okay, this indeed would be problematic. I was not aware of this behaviour. We should spec the proofs properly then.

tomaka commented 8 months ago

Again maybe I'm wrong, or maybe it's been optimized since then. My concern mostly comes from the fact that this is insanely complicated.

Just from the top of my head:

If the runtime calls storage_exists or storage_read with a size of 0, then you don't need to include the unhashed storage value when the trie version is V1.
If the runtime calls storage_append but never reads back the value afterwards, the value that was there doesn't need to be included in the proof, but I imagine that this is really hard to implement.
If you call storage_clear you don't need to include the deleted keys in the proof, except that if afterwards you call storage_next_key where the parameter is before the ones you've deleted and the next key is after the ones you've deleted, then you need to include in the proof the keys that you've deleted.
Calling storage_root means that you have to include in the proof all the modifications to the storage you've previously done. So for example if you call storage_set but not storage_root then the modified entry doesn't need to be included, but if you do call storage_root later then the modified entry does need to be included.
In a trie V1, if you call storage_set with a value identical to the one that was already there, then storage_root, then the proof doesn't need to include the unhashed storage value of the node that was set, since the receiver of the proof could compare the hash of the value being set with the existing hash.

In general trying to figure out what should be included in the case of storage_root makes my brain explode.

burdges commented 8 months ago

Another approach would opaque newtype(s) vaguely resembling std::time::{Duration,Instant,SystemTime}.

storage_proof_size returns an pub struct ProofSizeGuard<'a>(u32, PhantomData<&'a u32>);, which satisfies PartialOrd<u32>, and maybe other traits, but does not satisfy PartialEq<_>. We require drop ProofSizeGuard be dropped before invoking storage_proof_size again. Alternatively storage_proof_size always returns monotonically increasing values until you've dropped all ProofSizeGuards.

After collation, storage_proof_size returns the actual proof size from the PoV block, but before collation it returns whatever approximation the system currently believes. Any parachain could alter its internal storage_proof_size implementation, just like it could add custom storage, but we require storage_proof_size(parablock+PoV) < storage_proof_size(parablock) or else collators could produce invalid blocks (which backers reject, so no slashing).

In principle, storage_proof_size both increases and decreases unpredictably while building blocks, aka non-determinism, but only decreases could depend upon the block so far, while increases should be bounded by some function of whatever you just added.

A parachain could break these invariants using unsafe code, but they could do much worse anyways. storage_proof_size is deterministic from polkadot's persepctive.

skunert commented 8 months ago

Again maybe I'm wrong, or maybe it's been optimized since then. My concern mostly comes from the fact that this is insanely complicated.

Just from the top of my head:

If the runtime calls storage_exists or storage_read with a size of 0, then you don't need to include the unhashed storage value when the trie version is V1.

If the runtime calls storage_append but never reads back the value afterwards, the value that was there doesn't need to be included in the proof, but I imagine that this is really hard to implement.

If you call storage_clear you don't need to include the deleted keys in the proof, except that if afterwards you call storage_next_key where the parameter is before the ones you've deleted and the next key is after the ones you've deleted, then you need to include in the proof the keys that you've deleted.

Calling storage_root means that you have to include in the proof all the modifications to the storage you've previously done. So for example if you call storage_set but not storage_root then the modified entry doesn't need to be included, but if you do call storage_root later then the modified entry does need to be included.

In a trie V1, if you call storage_set with a value identical to the one that was already there, then storage_root, then the proof doesn't need to include the unhashed storage value of the node that was set, since the receiver of the proof could compare the hash of the value being set with the existing hash.

In general trying to figure out what should be included in the case of storage_root makes my brain explode.

I thought about this some more and do not think that this is actually a blocker for this RFC.

Initially, I was under the impression that we were talking about a suboptimal storage-proof algorithm in substrate. But the cases you are referencing are hostfunction implementation related. If a hostfunction reads more than strictly needed, we include extra items in the proof.

However, that is separate from the changes proposed here. If you want to implement a custom collator node today, you already need to be super careful. Since the parachain runtime contains the validate_block function, which has externalities and hostfunction implementations baked in, implementors need to mimic the externalities and hostfunction behavior anyway. If a custom collator generates a smaller proof than what we expect in the PVF, it would be a problem. In other words, you already have to follow the quirks of the substrate implementation today. Otherwise, the PVF might reject your block.

Not a great state to be in, but I believe the proposed hostfunction here does not change much compared to the current state. We should still correctly specify the host functions, but this sounds like a separate effort to me.

tomaka commented 8 months ago

However, that is separate from the changes proposed here. If you want to implement a custom collator node today, you already need to be super careful. Since the parachain runtime contains the validate_block function, which has externalities and hostfunction implementations baked in, implementors need to mimic the externalities and hostfunction behavior anyway. If a custom collator generates a smaller proof than what we expect in the PVF, it would be a problem. In other words, you already have to follow the quirks of the substrate implementation today. Otherwise, the PVF might reject your block.

Not a great state to be in, but I believe the proposed hostfunction here does not change much compared to the current state. We should still correctly specify the host functions, but this sounds like a separate effort to me.

I don't think that's accurate.

Yes, implementers need to mimic the externalities and host functions behavior anyway, but all the existing host functions are very clearly defined (maybe not explicitly in a document, but they're all pretty straight forward and their corner cases is easily figured). In other words, mimicking the host function behavior is relatively easy. Smoldot does it, for example, and it wasn't that hard.

The one introduced in this RFC is the first host function that has tons of obscure corner cases with what seems to me to be over-the-roof implementation mimicking difficulty.

Your counter-argument also doesn't address the backwards compatibility concern. Introducing this host function makes it impossible to modify the proof generation code without risking a netsplit. You'd have to do it in a centralized way by stopping all your collators at the same block, upgrading all of them simultaneously, then restarting all of them.

burdges commented 8 months ago

We definitely want parachains to ship their own custom storage designs, while using substrate. As a rule, these must iterate on their exact custom storage scheme.

We should iterate on the existing storage used in polkadot & system parachains too. Although painful, I suppose migrations should be possible, but we've good reasons to be flexible before collation anyways: You cannot know PoV size until you've built the full block, because you cannot know what merkle paths unify.

It's worse: If you've some KZG batching which does inner pairing products, then your PoV proof becomes very expensive in CPU time, so only the one collator ever computes. It's also plausible other collators must learn the block first. We then have a situation where the parachain itself cannot know what storage_proof_size should return on the relay chain. It's also possible collators behave differently depending upon whether they have ASICs or FPGA cards.

Again afaik there is no problem here with doing a storage_proof_size host call, but it should either manage the inherent non-determinism correctly, or else it should declare itself to be only an approximation, over estimate, or whatever, likely a field in the block header.

skunert commented 8 months ago

The one introduced in this RFC is the first host function that has tons of obscure corner cases with what seems to me to be over-the-roof implementation mimicking difficulty.

Where do you see these corner cases/what is "proof generation code" to you?

The trie crate and how it generates the proof from storage operations?
The hostfunctions/externalities and what storage items they touch to provide their functionality?

If I understand correctly, you are suggesting that it is easy to mimic the host function functionality, but a custom implementation would not necessarily need to arrive at exactly the same proof layout. This is what I don't understand. If your proof layout is any different from what substrate currently generates, your submitted block might fail.

Taking your first example:

If the runtime calls storage_exists or storage_read with a size of 0, then you don't need to include the unhashed storage value when the trie version is V1.

Whether or not you technically need to include this in the proof, as a implementor you need to exactly do what substrate does. If you produce a correct but smaller proof but substrate would include more nodes for whatever reason, the PVF will still try to call storage_exists with the baked in substrate stack. So what freedom do implementors lose with this host function? I don't yet see the wiggle room you are seeing (that is if you want to produce a block).

Your counter-argument also doesn't address the backwards compatibility concern.

The above means to me that we are already in this situation. How would you modify the storage proof algorithm today? Since the parachain runtime essentially contains a whole trie/ext/hostfunction stack, you need to upgrade them in sync (and keep the old version around). At least this is how my mental model works.

librelois commented 8 months ago

I agree that imposing determinism on such a complex algorithm poses problems, at the very least upgradeability problems.

One way of getting around the problem of the non-determinism of the proof generation algorithm is to ask the collator to inject the result of each call to this host function into the block, in the form of a vector of numbers, and to reuse this vector directly for importing and validating the block. This can be integrated into a new storage proof format, or injected into a special unsigned extrinsic at the end of the block (which would mean that the host function could not be used in on_finalize).

tomaka commented 8 months ago

If I understand correctly, you are suggesting that it is easy to mimic the host function functionality, but a custom implementation would not necessarily need to arrive at exactly the same proof layout. This is what I don't understand. If your proof layout is any different from what substrate currently generates, your submitted block might fail.

What I'm saying is that it is easy to mimic every host function other than ext_storage_proof_size_version_1, but that mimicking ext_storage_proof_size_version_1 is too complicated.

I am not suggesting that implementations should have freedom to generate different proofs. I'm actually saying the exact opposite: all implementations must generate the exact same proofs (apart from the ordering of items within the proof, which doesn't matter for the size), and it is too difficult to guarantee this.

Since "all implementations" also includes "multiple versions of Substrate", it blocks Substrate from ever improving the size of the proofs it generates.

What makes this dangerous is that since the code is at a very high level of abstraction compared to the actual proof being generated. It can therefore be really subtle: you can modify two small lines of code in the trie crate and accidentally break backwards compatibility. And you will find out the problem only when you want to author a block whose proof size is within the difference between implementations, which I guess will happen rarely, making this even more subtle.

Basically, this looks to me like a receipt for disaster. Either every full node uses the same version of Substrate, or one day some nodes but not all will refuse a block and someone will need to do a very long and painful cowboy-investigation of the problem.

librelois commented 8 months ago

Either every full node uses the same version of Substrate

This is not the case in production, and it never will be: it's far too restrictive. Think also about collators and full nodes for parachains, not only for relay, it's hard enough to update new substrate versions quickly, and it takes even longer for collators or full nodes for external partineraries (like block explorers). In practice, on the Moonbeam side, for example, we currently have 4 different versions of full nodes in production!

More generally, I think that the storage proof generation algorithm should not be part of the consensus, the only requirement should be the validity of the proof and a size limit. If the host function is integrated as it stands, the proof generation algorithm will be part of the consensus rules.

It would be great to take more time to evaluate alternative solutions, such as injecting the results of the host function into the block, or finding a way to satisfy the need without a host function.

burdges commented 8 months ago

One way of getting around the problem of the non-determinism of the proof generation algorithm is to ask the collator to inject the result of each call to this host function into the block, in the form of a vector of numbers, and to reuse this vector directly for importing and validating the block.

At some level, our whole goal here is to "reclaim" weight not really used by tx, as well as to have separate space and cpu weights. If we could do what you say then we could just adjust the weights in roughly the current weight system, but this obviously does not work correctly.

Instead, we want the space weights to more accurately reflect the size in the final PoV block. If someone ships an optimization that improves space under niche conditions then the blocks from collators who upgraded should benefit, while the blocks from other collators should not benefit.

This in inherently non-determinism is several ways: Some collators do not have the new code. Some collators have more CPU time to spend compressing the proof. Some parachain designed will result in some collators not even being able to build the proof. And all collators must consider the tx before they finish building the block.

Importantly, there are different classes of non-determinism: (1) Any post-collation non-determinism is a treat to polkadot consensus, and must never be allowed. (2) Pre-collation non-determinism in parablocks feels really weird, and complicates some things parachains might do, but should basically be fine for typical parachains who simply follow relay chain consensus. (3) Now partial block non-determinism is annoying for UX, but it's also unavoidable for smart contracts, ala MEV. We'll inherently increase our existing non-determinism of type (3) and add new some non-determinism of type (2). Yes (2) requires some discussion, but if performance requires this then we should accept that and separately solve issues it creates for pre-finality grandpa.

librelois commented 8 months ago

our whole goal here is to "reclaim" weight not really used by tx

Yes, I know, and this will be the case, if the collators have the economic incentive to fill the block as much as possible, they have an interest in providing the smallest possible proof size to the runtime to be allowed to integrate more transactions and earn more fees.

If we could do what you say then we could just adjust the weights in roughly the current weight system, but this obviously does not work correctly.

Maybe I'm missing something, but I don't understand why it wouldn't work. It seems to me that the purpose of this RFC is only to optimize the use of the PoV/proof_size resource, not the cpu_time.

With my proposal, if a collator generates more compact storage proofs, it's not a problem, because the relay and other collators and para full nodes won't need to regenerate the storage proof, they just need to read it and check its validity (as it is now).

burdges commented 8 months ago

Incentives are irrelevant here: Hosts are upgraded at different speeds. Hosts have different hardware. etc. Adversaries can do whatever they can do which whatever non-determinism one permits, ala MEV.

With my proposal, if a collator generates more compact storage proofs, it's not a problem, because the relay and other collators and para full nodes won't need to regenerate the storage proof, they just need to read it and check its validity (as it is now).

A relay chain validator already sees whatever proof the collator built, so they've no need for extra numbers, unless those numbers wind up being pessimistic over estimations, akin to our current weights system.

Parachain nodes cannot necessarily either see the full PoV block or compute the same storage proof, so they cannot produce correct values for the host calls, nor validate the numbers being inserted.

We could simply accept the non-determinism at the parachain level since everything remains deterministic after collation at the relay chain level. We'd encapsulate this non-determinism behind opaque types which discourage miss-use. We'd warn parachains with unusual pre-collation consensus phases to either no use this mechanism, or else ensure it canno really break their consensus.

skunert commented 8 months ago

One way of getting around the problem of the non-determinism of the proof generation algorithm is to ask the collator to inject the result of each call to this host function into the block, in the form of a vector of numbers, and to reuse this vector directly for importing and validating the block.

This is very similar to what was originally proposed in the issue https://github.com/paritytech/polkadot-sdk/issues/209#issuecomment-1691687910 before we switched focus to the more "simple" solution proposed here. I guess it is an option to reevaluate the original approach.

Parachain nodes cannot necessarily either see the full PoV block or compute the same storage proof, so they cannot produce correct values for the host calls, nor validate the numbers being inserted.

The values delivered in the block would be used instead of the host calls. Its basically just the collator saying, "this is how much proof size was consumed at this point during block production.". Not sure how a collator would cheat by manipulating these number while going undetected. In the end, the weight during block import has to be correct and the PoV size has to be small enough for the relay to accept it.

burdges commented 8 months ago

Alright, basti's comments there make more sense. Yes, this complicates determinism, and fancy parachain consensus, but acceptably I guess. If I understand..

We need this roughly host call in block building so that block producers know how much they've used. It'll often return non-sensical and/or non-deterministic values but that's unavoidable, and acceptable in block production.

We do not need this host call after collation, aka in validate_block, because the relay chain enforces these limits elsewhere. It's fine if the host call panics, returns None, or whatever here, so long as its deterministic.

We're kinda stuck on what happens during parachain block imports. As @tomaka says, we do not have deterministic execution if we return meaningful values computed locally. Yet, we could theoretically make this deterministic for some storage types in the distant future, or maybe even in the near future if we handled upgrade regimes more strictly. We should not impose such discipline upon parachain teams though.

We've seemingly three choices for import:

We make this host call panic, or return None, in both validation and import. A parachain node cannot learn if the PoV block consumes too much space this way, but they could optimistically accept this, and adapt if rejected by the relay chain. We do learn estimates during block production, so honest collators obey this.
We make this host call return local estimates during import. It's break determinism for the prachain, which complicates fancy parachain consensus, but it's fine for standard parachains who obey the relay chain. In validation, we'd either return the amount read so far from the PoV, or else panic or return None, both of which give determinism.
We encode the block builder estimates somehow, which saves determinism during import by parachain nodes, and simplifies parachain debugging, but the value enforces nothing. Ideally, these should be stripped when preparing the PoV block, since the relay chain has no use for them, so the host call then panics or returns None in the relay chain.

We dislike complicating fancy parachain consensus, which rules out 2 probably. Ain't much difference between 1 and 3, but 1 looks simplest, and avoids confusion.

skunert commented 8 months ago

We make this host call panic, or return None, in both validation and import. A parachain node cannot learn if the PoV block consumes too much space this way, but they could optimistically accept this, and adapt if rejected by the relay chain. We do learn estimates during block production, so honest collators obey this.

In your choices, 1 is just 3 without including the numbers in the block right? I don't think 1 can currently work. During import, nodes would need a way to calculate the overall consumed storage weight that goes into storage. If we don't deliver any values with the block, they will only have the benchmarked weights. This would also be a problem for the on_idle hook which can dynamically use the rest of the available weight. But they need to know how much can be reclaimed to arrive at the same result the collator did.

If we don't want 2 we should go with 3 IMO.

burdges commented 8 months ago

In principle, there maybe advantages in leaving this up to the parachain team. I suppose 2 works fine for simple parachains. It's also basti's preference, or did he change his mind?

If the host call returns None then any dependent code does nothing, meaning we forgo size weight checking, but yes alright on_idle requires the creator say the remainder. It risks spam within the parachain too I guess.

If you do embed the values, then you could compare them with the amount read thus far during validation, so they must be correct or else the block becomes invalid. In principle, parachain nodes could simply trust one another, which risks spam, but the relay chain would never approve unless correct.

Approach: You embed the values, do the comparison of read thus far in validate_block, and do no local comparisons on the parachain side. As this risks spam, we document how parachain teams could compare vs local storage to thwart attacks, but suggest removing bad nodes instead. Aka 3 with an option for 2-ish.

It's kinda annoying to embed a bunch of intermediate values when the relay chain only cares about the final total.

bkchr commented 8 months ago

Introducing this host function makes it impossible to modify the proof generation code without risking a netsplit. You'd have to do it in a centralized way by stopping all your collators at the same block, upgrading all of them simultaneously, then restarting all of them.

This is not any valid argument. You could say the same about adding new host function or whatever, that they risk a netsplit. However, we still handle them gracefully. The thing that we need to change is that we make the "proof generation" of each host function a part of the specification of each host function. This means that if you want to roll out any kind of optimization, you would need to roll out a new version of the host function and then the proof generation is versioned in the same way as the host function itself. This means, as long as the runtime uses the old host function, the proof generation doesn't change. I don't see any problem in doing so.

What makes this dangerous is that since the code is at a very high level of abstraction compared to the actual proof being generated. It can therefore be really subtle: you can modify two small lines of code in the trie crate and accidentally break backwards compatibility.

This problem already exists right now and it doesn't change with this proposal. The only thing I'm seeing here is to double down on testing to ensure that this doesn't happen in production (or better, reducing the possibility). And as @skunert already said, the Cumulus based PVF implementation is already sharing the same implementation details as the node implementation. This would mean that if you do some cowboy change, the PVF would stop accepting these blocks on the relay chain as a small detail changed. This also means that we already have this requirement of keeping this stuff in sync/upgradeable.

More generally, I think that the storage proof generation algorithm should not be part of the consensus, the only requirement should be the validity of the proof and a size limit. If the host function is integrated as it stands, the proof generation algorithm will be part of the consensus rules.

While we don't need it for this RFC and we could workaround with the proposal mentioned above, there will be a point in the future when the "proof generation" will be part of the consensus. When we bring "runtime weight metering", we will need to use the exact same proof generation everywhere to know if a transaction is running over the announced limits. Thus, I don't see any reason why we should go for some other way here, if we already know that we need this in the future any way. With runtime weight metering we will also need to track the proof size to abort a transaction if it used too much proof size. This needs to be deterministic as otherwise collators will be able to "reject" transactions, but still getting paid for it because it is in the block.

We could argue that you don't need this at block import and you use the recorded proof size from the block producer, but you would still need to run the same logic in validate_block. As validate_block is the instance that ensures that the state transition is valid. This would remove some "headache" for historic blocks, but IMO we should stick to have the proof generation specced properly. I'm also down to help on this on @tomaka's new markdown spec.

It's kinda annoying to embed a bunch of intermediate values when the relay chain only cares about the final total.

To make it clear again, this RFC is not touching any of the logic that is relevant to the relay chain.

skunert commented 8 months ago

I agree to bastis points, the current proof algorithm is already deeply integrated and needs to stay.

I think the original approach is still feasible, and its up to the parachain to decide whether to ever call this function or not. I will open this up for voting tomorrow if no blockers are raised.

burdges commented 8 months ago

Afaik there is no problem here if doing 3, but I think @tomaka point is our "current proof algorithm" is not sufficiently deterministic for 2. That's not a show stopper, either fix the algorithm by eliminating the redundant copath elements, or do 3.

In general, if you do 3 then you can adapt this to more storage schemes more easily. And our current trie should hopefully be completely replaced in the future.

tomaka commented 8 months ago

This would remove some "headache" for historic blocks, but IMO we should stick to have the proof generation specced properly.

I'm personally not willing to retro-engineer Substrate in order to spec proof generation. I think it's insanely complicated, and I don't think people who have commented realize how complicated it is. IMO for the sake of this RFC it should be assumed that speccing proof generation isn't possible.

skunert commented 8 months ago

IMO for the sake of this RFC it should be assumed that speccing proof generation isn't possible.

I understand that it is complicated. But I don't think it is impossible. We could for example write some compliance tests in WASM that would ensure a host implementation matches what is expected. But that is just an idea on how to make it easier to test implementations.

The goal for this RFC was to raise awareness that such a hostfunction might come and gather feedback. We got feedback and the hostfunction proposed has been improved as a result. In the end, we already established that parachain teams are free to do as they like. If they have a fancy custom storage scheme, they might decide not to call the hostfunction or not even include it. If they don't like the approach as described in https://github.com/paritytech/polkadot-sdk/issues/209, they could even implement variant 3 (which would still require the presence of this hf). The solution is not perfect, but for a majority of the parachains I expect this to be a fitting mechanism.

Up for vote: https://collectives.polkassembly.io/member-referenda/45

xlc commented 7 months ago

While I really need this RFC so that proof weight is not completely useless for parachains, I am also unsure if this RFC should be accepted as it is. I still don't know how much technical details is needed for RFC but I am sure people won't be able to read this RFC and understand exactly the change and update alternative client to correctly implement the RFC.

xlc commented 7 months ago

We also clearly don't have enough participation on this RFC (and most other RFCs). I also expect approval from alternative node implementation teams. If they find this RFC contains enough details, well, sure, that means it is enough.

bkchr commented 7 months ago

I also expect approval from alternative node implementation teams.

You would need to specify what kind of alternative node implementations? The relay chain? If yes, the relay chain node is never aware of this host function as already discussed in here.

We also clearly don't have enough participation on this RFC (and most other RFCs).

Yeah and thus we are waiting until the participation magically appears? I mean we can then also just shutdown everything and move on.

xlc commented 7 months ago

this RFC have already identified stakeholders so we should be able to ask them to take a read but yeah I agree we will need to just continue the work after we tried to get feedbacks and failed.

skunert commented 7 months ago

this RFC have already identified stakeholders so we should be able to ask them to take a read

Who do you mean by this? I mean you are right, this RFC alone does not contain all the details about the storage proof to implement it in a non substrate client.

But IMO the speccing of the proof generation does not need to be part of this RFC, it can be done in a separate document.

As you said the proof weight has currently some limitations that should be addressed. Since all collator implementations are currently substrate based anyway we should move forward in my opinion.

tomaka commented 7 months ago

Since all collator implementations are currently substrate based anyway we should move forward in my opinion.

This is a snake biting its tail. This kind of RFC is exactly why it won't be possible to create another implementation.

Please look at the low quality of the source code of Substrate. Do you think that Polkadot has a long term future if it keeps using this code? Every little change to the core takes years because there are literally less than 5 people in the world who are capable of working efficiently on Substrate. Do you not think that it should be possible to at least leave the possibility to phase out Substrate and let other people build alternative, more clean, implementations in order to give a future to Polkadot?

But IMO the speccing of the proof generation does not need to be part of this RFC, it can be done in a separate document.

I would agree if this separate document had been written before this RFC. This RFC is based on the assumption that it's even possible to write such a document, which, I repeat, I'm strongly skeptical about.

We all know that nobody is going to make the effort speccing the proof generation. The idea of "speccing the proof generation in a separate document" is just a way to sweep the problem under the carpet and delegate it to "someone else", while we all know that there won't be a "someone else".

skunert commented 7 months ago

Please look at the low quality of the source code of Substrate. Do you think that Polkadot has a long term future if it keeps using this code? Every little change to the core takes years because there are literally less than 5 people in the world who are capable of working efficiently on Substrate.

This view seems overly pessimistic. Substrate has many people contributing and is solving non trivial problems. For a big project I would say it's in pretty normal shape.

We all know that nobody is going to make the effort speccing the proof generation. The idea of "speccing the proof generation in a separate document" is just a way to sweep the problem under the carpet and delegate it to "someone else", while we all know that there won't be a "someone else".

To me it's pretty clear that it's on me and @bkchr to deliver this. I agree with you, no one is going to jump out of the bushes to pick this up randomly.

skunert commented 5 months ago

/rfc process 0x6386c7812bbed74cde165d0a074e69fa1b5d6845bf8aa914276305baf6f3b240

paritytech-rfc-bot[bot] commented 5 months ago

@skunert Handling the RFC command failed :( You can open an issue here. See the logs here.

rzadp commented 5 months ago

/rfc process 0x6386c7812bbed74cde165d0a074e69fa1b5d6845bf8aa914276305baf6f3b240

paritytech-rfc-bot[bot] commented 5 months ago

The on-chain referendum has approved the RFC.

rzadp commented 5 months ago

/rfc propose

paritytech-rfc-bot[bot] commented 5 months ago

Hey @rzadp, here is a link you can use to create the referendum aiming to approve this RFC number 0043.

Instructions

1. Open the [link](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-collectives-rpc.polkadot.io#/extrinsics/decode/0x3d003e02015901000049015246435f415050524f564528303034332c62366361383864613664383837613435616463393931643638616130386532393238306562393064346165346337656163653361623864626566373833613766290100000000). 2. Switch to the `Submission` tab.

3. Adjust the transaction if needed (for example, the proposal Origin). 4. Submit the Transaction

It is based on commit hash 30fa392f64d9942a5d0e089dfc938de05de34073.

The proposed remark text is: RFC_APPROVE(0043,b6ca88da6d887a45adc991d68aa08e29280eb90d4ae4c7eace3ab8dbef783a7f).

polkadot-fellows / RFCs

Introduce `storage_proof_size` Host Function for Improved Parachain Block Utilization #43