`paraInclusion` is massively overestimating its weight

ggwpez commented 2 years ago

Take a look at some recent Polkadot blocks. Even empty blocks consume ~500ms of weight.
Benchmarking such blocks shows that only a fraction (~4%) of the weight is actually needed.
That means paraInclusion is massively overestimating its weight cost.
@shawntabrizi noted that we could think about refunding weight here. Screenshot from 2022-03-23 15-57-10

Output from benchmark-block on reference hardware with production profile and wasm executor:

Block 9556370 with     2 tx used   4.29% of its weight (    21,509,600 of    501,617,735 ns)    
Block 9556371 with     3 tx used   4.08% of its weight (    20,643,956 of    505,412,849 ns)    
Block 9556372 with     2 tx used   3.94% of its weight (    19,750,293 of    501,617,735 ns)    
Block 9556373 with     2 tx used   3.98% of its weight (    20,368,847 of    511,839,008 ns)    
Block 9556374 with     3 tx used   3.97% of its weight (    19,920,982 of    501,891,126 ns)    
Block 9556375 with     3 tx used   3.98% of its weight (    20,114,849 of    505,370,681 ns)

shawntabrizi commented 2 years ago

I think probably we can solve this with a weight refund. My guess is this overestimate comes from the fact that we must assume a lot of things about how many parachains there are, how many are upgrading, how many are sending messages, etc...

If we can track those real numbers and stick it back into the benchmark function, then we can refund any unused weight.

burdges commented 2 years ago

It depends what weight means too, like does off chain work count? We only do availability when backing a parachain block, but paraInclusion triggers approval checks with like 40-50 nodes downloading and checking the parachain block. Also approval checks shall become the dominant rewards source for validators.

ggwpez commented 2 years ago

I'm not sure if I understand your question. The weight should reflect the time that it takes validators to import the block.
If you want to specifically incentivize something, you can modify the fee per weight for that.

Otherwise we have the problem like right now that the blocks are 25% full without much in them. @burdges

burdges commented 2 years ago

Alright cool. We de facto rate limit approvals, etc. by the number of availability cores anyways, which makes sense.

rphmeier commented 2 years ago

Yeah @burdges this only concerns the on-chain weight, not the implicit weight of coordinated off-chain stuff.

kianenigma commented 1 year ago

I think this should be tackled not too long in the future. Perhaps someone from FRAME team can give you a hand (az Zeke did last year), especially if someone knowledgable is willing to help out.

eskimor commented 1 year ago

Indeed. We need a better way of tracking weight. The current top-level benchmarking of enter does not work at all. There are too many parameters to consider, including internal state and effects. Also the benchmarking system works poorly with many parameters.

We might need to go back to (semi-) manual tracking. E.g. benchmarking at building blocks and combining them. This would reduce the number of parameters needed for any single benchmark significantly and might make "proper" weight tracking actually feasible and manageable.

burdges commented 1 year ago

Why was this so heavy in the first place? Just on-chain messages? If so, we could bill for inclusion (cheap) and message count (expensive).

eskimor commented 1 year ago

The way it is calculated is just severely messed up, this combined with benchmarking doing weird stuff if you have many parameters. We can chat, if you are interested.

kianenigma commented 7 months ago

sandreim commented 4 months ago

This needs to be fixed soon, at 500 parachain validators on Kusama we see more than 70% weight usage.

eskimor commented 4 months ago

@ordian you have been looking into this and related issues already, right?

ordian commented 3 months ago

Overestimation

Looking at https://www.polkadot-weigher.com/history, we see the current block weight numbers for Kusama are around 75%:

A typical block on Kusama has around 500 bitfields (with around 20 candidates included) and around 20 candidates backed in its ParaInherent, which consumes almost all of the weight.

Similarly, Polkadot's weight is around 42% with 300 bitfields and 20 backed candidates per block.

Importing these blocks in practice takes much less time suggesting a benchmarking overhead over 10x.

Back-of-the-envelope

Looking at the bench numbers for kusama, we have:

type	weight	rocksdb	paritydb
bitfield	480us + 27r + 17w	2.855ms	1.546ms
backed	1409us + 30r + 16w	3.759ms	2.449ms

The weights differ significantly based on whether the weight is set to RocksDB or ParityDB backend (currently, it's rocksdb).

Given that a typical block will contain n_validators bitfields and n_cores backed candidates, the bottleneck seems to be the bitfields (we expect 1k n_validators and 200 n_cores in the not so distant future). Also as we can see from the db backend difference, the bottleneck is in db ops, not CPU (ParityDB weights are lower than RocksDB's).

Causes

Currently, the way we calculate the weight of processing a bitfield and a backed candidate is by running the whole enter function. The are 2 main problems with this approach:

The weight of processing an empty inherent is non-trivial and is in included in the cost for each bitfield/backed candidate.
The cost of processing n bitfields != n * the cost of processing bitfields for 1 candidate. For the example, the number of signatures does not depend on the number of included candidates, but only on the number of validators. Some of the DB reads are shared between all candidates. So the benchmark for bitfields seems to be incorrect.

Short-term fix

A proposed short-term fix is to address problem 1 and switch to ParityDB weights as it should be the default.

The estimated weights with the proposed fix should be (note that the bench weights would remain the same, but we subtract the weight of the empty parainherent when processing enter):

type	weight	rocksdb	paritydb
empty	121us + 15r + 5w	0.996ms	0.491ms
bitfield	359us + 12r + 12w	1.859ms	1.058ms
backed	1288us + 15r + 11w	2.763ms	1.958ms

Given these numbers, current Kusama blocks should be estimated to weigh 1.859ms 500 + 20 2.763ms = 0.985s = 49% we shaved off 25% of the total weight With the switch to ParityDB db weights it would be 0.568s = 28% another 21% shaved off

With these (projected) numbers we should be able to scale to 1k validators with 200 (async backing) cores with para inherent weight being 1.5s = 75%

See https://github.com/paritytech/polkadot-sdk/pull/5082 which addresses point 1.

We can optimize this further by addressing the point 2 in a follow-up PR (being worked on).

Long-term fix

Instead of running the whole enter function in order to estimate the cost of processing individual bitfields/backed candidates, it would be better to refactor the code into sections that processes disputes, bitfields and backed candidates and only benchmark specific subsection/function. Luckily such a refactoring was deemed necessary for other reasons and is also being worked on.

Conclusion

We are not blocked on ParaInherent weight overestimation for scaling to 1k and 200 cores with a simple fix (#5082).

alexggh commented 2 months ago

Hi,

I just observed that on kusama the paraInherent weight halfed after we enacted Update Kusama v1.3.0 it went from around 73% before the enactment to around 35% after the enactment, the number of the candidates does not seem to change, so I guess the weights changed.

That's good news, but was that expected ?

ordian commented 2 months ago

Yes, that's expected. See https://github.com/paritytech/polkadot-sdk/pull/5082#issuecomment-2317095079

paritytech / polkadot-sdk