PoV reclaim underestimates proof size

s0me0ne-unkn0wn commented 1 month ago

The PoV reclaim signed extension calculates the actual storage proof size as a difference between the recorded proof sizes in the extension's post_dispatch and pre_dispatch calls. Then, the delta between the real storage proof size and benchmarked size is calculated, and the runtime-registered weight is corrected by that delta.

The problem is that after Executive calls into the Applyable::apply but before the PoV reclaim's pre_dispatch, many things happen that also affect the recorded proof size. Namely, pre_dispatch hooks of other signed extensions get called.

The CheckNonce extension, which updates the nonce and stores the account data in the Account map, had the most significant influence on the proof size I observed. Other things that are not so influential but worth mentioning are the writes to AllExtrinsicLen and BlockWeight itself by the CheckWeight extension.

The PoV reclaim, not being aware of those storage proof size increments, decreases proof size in the runtime too much. To give a concrete example, for a transaction that actually consumed 3769 bytes of proof (measured as a difference between recorded proof sizes before and after calling into apply), the runtime incremented proof size by a benchmarked value of 3731 bytes, and then 1758 of them were reclaimed by the PoV reclaim, resulting in 1973 bytes of proof increase being registered by the runtime. So 3769 was registered by the recorder and 1973 by the runtime, a huge difference.

The result is the wrong proof size registered by the runtime, which may, above all else, be a security problem.

The following graph shows how proof size increases as registered by the node and by the runtime with increase of the number of transactions.

CC @skunert @bkchr @eskimor

bkchr commented 1 month ago

Nice find!

https://github.com/paritytech/polkadot-sdk/blob/master/cumulus%2Fprimitives%2Fstorage-weight-reclaim%2Fsrc%2Flib.rs#L193

Here we need to add currenr.proof_size = max(current.proof_size, post_dispatch_proof_size). This should prevent this issue.

s0me0ne-unkn0wn commented 1 month ago

I'm also wondering if just moving the StorageWeightReclaim to the top of the extensions list is the mitigation. After some tests, it seems it is. I see an opposite bias, though, but not that big, and it's safer to be biased in that direction anyway. Namely, a random block stops authoring having 3_510_188 recorded at node and 3_929_255 registered by the runtime. Most probably comes from some storage operations in the post_dispatch of other extensions.

s0me0ne-unkn0wn commented 1 month ago

Here we need to add currenr.proof_size = max(current.proof_size, post_dispatch_proof_size). This should prevent this issue.

Hmmm, so you propose to just stick to the recorded weight, effectively discarding runtime's own calculations? Interesting, I wonder if it would interfere with the unspent weight which is refunded by CheckWeight later (just thinking aloud, I don't know how unspent works and I never seen it being non-zero).

skunert commented 1 month ago

Here we need to add currenr.proof_size = max(current.proof_size, post_dispatch_proof_size). This should prevent this issue.

This is a good way to alleviate the issue. However we should add the BlockLength that has been recorded until now too.

The only problem of this patch is that this is that if we run into this case we lose track of the weight that has been reserved for on_finalize. This should for sure be integrated as a safe guard, but its not the end of this story.

I'm also wondering if just moving the StorageWeightReclaim to the top of the extensions list is the mitigation. After some tests, it seems it is. I see an opposite bias, though, but not that big, and it's safer to be biased in that direction anyway. Namely, a random block stops authoring having 3_510_188 recorded at node and 3_929_255 registered by the runtime. Most probably comes from some storage operations in the post_dispatch of other extensions.

While this is true for the specific case we were testing, I don't think its safe to make this general assumption that post_dispatch of SignedExtensions consumes less proof size.

so you propose to just stick to the recorded weight

Only for the case where we run into the condition though. For the general case we should not just set the runtime storage item to the proof size, because on_initialize returns the weight for itself plus the weight for on_finalize.

To fix this problem properly I can currently think of two ways:

Integrate the reclaim into frame-executive, making sure that the proof size respects the whole execution, including all SignedExtensions. I don't think much of this approach because the reclaim is parachain specific and optional.
Add an option for SignedExtensions that act as a "wrapper". So that their pre_dispatch is executed first and their post_dispatch is executed last. This would require some research on what would be the best way to implement.

s0me0ne-unkn0wn commented 1 month ago

Add an option for SignedExtensions that act as a "wrapper". So that their pre_dispatch is executed first and their post_dispatch is executed last. This would require some research on what would be the best way to implement.

One more option is to have two paired extensions, one implementing pre_dispatch and residing on the top of the list and another one with post_dispatch at the bottom. That design has the advantage of not affecting the current architecture in any way, but I cannot think out of my head of any means for them to share the state. AFAIK, signed extensions are not supposed to talk to each other.

bkchr commented 1 month ago

Add an option for SignedExtensions that act as a "wrapper". So that their pre_dispatch is executed first and their post_dispatch is executed last. This would require some research on what would be the best way to implement.

No, this is not what I meant. What I meant is that storage reclaim wraps all other extensions. This way it has control over pre and post dispatch.

BTW, while thinking about this. The costs for the extensions should be included in the base extrinsic weight(cc @ggwpez). Maybe this wasn't updated for the chain you tested this with @s0me0ne-unkn0wn? Or maybe that is just not added to the extrinsic weight directly. In the future we will have each extension being benchmarked separately, which should help here as well.

gui1117 commented 1 month ago

Indeed signed extensions weight should be taken into account by base extrinsic weight in theory.

For future transaction extensions with weight we can do wrapping extension or 2 extensions. Both are doable and fine to me. I draft an implementation of wrapping extension here https://github.com/paritytech/polkadot-sdk/pull/5234

skunert commented 1 month ago

Indeed signed extensions weight should be taken into account by base extrinsic weight in theory.

Do we assume a number of accounts for this? Afaik @s0me0ne-unkn0wn was filling the state with tons of accounts for these tests, maybe recalculation of this weight would be necessary?

EDIT:

Only in theory, seems like base extrinsic weight is

pub const ExtrinsicBaseWeight: Weight = Weight::from_parts(constants::WEIGHT_REF_TIME_PER_NANOS, 0)
        .saturating_mul(125_000);

everywhere.

s0me0ne-unkn0wn commented 1 month ago

Yes, indeed, this test is done with 256k accounts in genesis to check, among other things, how the state size affects throughput. As far as I can see, the rococo-parachain used in this test uses no-op extrinsic base weight from frame-support with a proof size of zero. But nevertheless, having it as a constant doesn't look right to me. It means we're not ready to scale the network. Should every parachain hold a referendum to update that constant every time the number of existing accounts grows significantly? And somebody should notice that increase, make the change, and propose a referendum. Sounds like manual-mode scaling. I believe it should be much more automagical.

s0me0ne-unkn0wn commented 1 month ago

@gui1117 oh, I absolutely love how you first say "yep, it's doable" and then pull a 1200-line PR out of your pocket :rofl:

skunert commented 1 month ago

Yes, indeed, this test is done with 256k accounts in genesis to check, among other things, how the state size affects throughput. As far as I can see, the rococo-parachain used in this test uses no-op extrinsic base weight from frame-support with a proof size of zero. But nevertheless, having it as a constant doesn't look right to me. It means we're not ready to scale the network. Should every parachain hold a referendum to update that constant every time the number of existing accounts grows significantly? And somebody should notice that increase, make the change, and propose a referendum. Sounds like manual-mode scaling. I believe it should be much more automagical.

Absolutely should be automatic, just trying to identify the involved parts. I think the wrapper draft will be the way to go.

gui1117 commented 1 month ago

@gui1117 oh, I absolutely love how you first say "yep, it's doable" and then pull a 1200-line PR out of your pocket 🤣

I was already working on it coincidentally for transaction extensions PR :-)

kianenigma commented 1 month ago

very interesting discovery @s0me0ne-unkn0wn!

Prelude: let's take a moment and appreciate in advance that with PVM, someday, we can get rid of this.

If the solution of this lies in the order of signed extension, or somehow them being wrapped, I have proposed in general moving some signed extensions that we think are "mandatory" to a fixed list, and using it in templates. This will be a strong signal from us that we think this is how one should configure their runtime.

https://github.com/paritytech/polkadot-sdk/blob/11fdff18e8cddcdee90b0e2c3e35a2b0124de4de/substrate/frame/src/lib.rs#L293-L304

to scale the network. Should every parachain hold a referendum to update that constant every time the number of existing accounts grows significantly?

Yes, this is the vision, but it is a pretty cumbersome process as you can imagine, and worst of all it is not well documented.

Truth is, benchmarking is a race that we are losing right now, or we are barely holding onto. As in, even in RC, I believe the state is bigger (don't quote me on this) than our "assumption" of benchmarking. As noted in https://github.com/paritytech/polkadot-sdk/issues/5223#issuecomment-2271450355, I think moving towards PVM is hopefully the right direction.

There is another issue with increasing the assumption as well: That you pessimistically reduce throughput, esp. if the benchmarking code is written poorly, and there is not enough manual refunds in the code. With a good PoV reclaim we can be a bit bolder here and confidently increase the static weights, as they will get reclaimed.

As I commented in

gui1117 commented 1 month ago

There might be a bug in reclaim which explains this situation: https://github.com/paritytech/polkadot-sdk/pull/5273

basically proof size is understimated if pre dispatch info overestimate the proof size and post dispatch info underestimage the proof size at the same time.

ggwpez commented 1 month ago

The extrinsic base weight benchmarking is indeed not looking at the PoV weight. The logic currently is to build an empty block and measure that. The problem is that the block is build in the client - not the runtime. Therefore it is not compatible with the omni-bencher.

But it could be possible to extend it at least for the polkadot non-omni node.

In the future we will have each extension being benchmarked separately, which should help here as well.

Yea the TE merge request adds this, then we could just add that here:

https://github.com/paritytech/polkadot-sdk/blob/a633e954f3b88697aa797d9792e8a5b5cf310b7e/bridges/chains/chain-bridge-hub-cumulus/src/lib.rs#L93

ggwpez commented 1 month ago

I changed the overhead benchmarking to also include the proof when building the block until we have the TE benchmarks: https://github.com/paritytech/polkadot-sdk/pull/5283

It could help to ensure that we dont underestimate the proof size upfront.

skunert commented 4 weeks ago

Small write-up for future reference and for others who want to reason about this bug. Tests by @s0me0ne-unkn0wn have shown that #5281 and #5273 have indeed improved the situation significantly. The runtime now correctly rejects transactions very close to the specified limit. However, there are still some considerations to keep in mind.

Impact of #5281

In general, when we detect that the node-side storage proof is already larger than what we have recorded in the runtime, we set it to the node-side proof. At some point, the runtime BlockWeight will reach its limit. After that, we will reject the next transaction due to insufficient proof weight. However, since we set the runtime-side proof weight once to the node value, the runtime loses track of how much weight was reserved for on_finalize. The weight consumed by on_finalize will be added on top, potentially exceeding the size limit.

There are safeguards in place that make a real-world impact of this unlikely:

We still have the conservative limit of 2.5 MB set on the node side.
Every proof-size related result from the node side is uncompressed. So, even if we slightly exceed the limit, the resulting PoV will be compressed afterward.

Impact of the Extrinsic Composition

I was initially surprised by the significant impact shown in the initial graph. On further thought, it makes sense. The tests were TPS style, with lots of balance transfers. With a small extrinsic that only reads around two storage items, the impact of a missed CheckNonce::pre_dispatch is high. If you have many extrinsics that touch more storage items, the discrepancy will be lower.

Impact of SignedExtension Ordering

In the PoV-reclaim enablement guide, we recommend placing the StorageWeightReclaim SignedExtension last in the list. It sounds counterintuitive at first, but perhaps we should recommend placing it first in the list. The post_dispatch items of most SEs touch storage items that have been read before. Therefore, it should be better to include pre_dispatch hooks in the reclaim scope rather than the post_dispatch hooks. Storage is read for the first time in pre_dispatch whereas post_dispatch of many SEs modify items that have been read before. I think of this as just a rule-of-thumb, because it does not need to be true for all SEs and is not enforceable anyway. So low impact.

paritytech / polkadot-sdk