Closed ggwpez closed 2 months ago
I think probably we can solve this with a weight refund. My guess is this overestimate comes from the fact that we must assume a lot of things about how many parachains there are, how many are upgrading, how many are sending messages, etc...
If we can track those real numbers and stick it back into the benchmark function, then we can refund any unused weight.
It depends what weight means too, like does off chain work count? We only do availability when backing a parachain block, but paraInclusion
triggers approval checks with like 40-50 nodes downloading and checking the parachain block. Also approval checks shall become the dominant rewards source for validators.
I'm not sure if I understand your question. The weight should reflect the time that it takes validators to import the block.
If you want to specifically incentivize something, you can modify the fee per weight for that.
Otherwise we have the problem like right now that the blocks are 25% full without much in them. @burdges
Alright cool. We de facto rate limit approvals, etc. by the number of availability cores anyways, which makes sense.
Yeah @burdges this only concerns the on-chain weight, not the implicit weight of coordinated off-chain stuff.
I think this should be tackled not too long in the future. Perhaps someone from FRAME team can give you a hand (az Zeke did last year), especially if someone knowledgable is willing to help out.
Indeed. We need a better way of tracking weight. The current top-level benchmarking of enter
does not work at all. There are too many parameters to consider, including internal state and effects. Also the benchmarking system works poorly with many parameters.
We might need to go back to (semi-) manual tracking. E.g. benchmarking at building blocks and combining them. This would reduce the number of parameters needed for any single benchmark significantly and might make "proper" weight tracking actually feasible and manageable.
Why was this so heavy in the first place? Just on-chain messages? If so, we could bill for inclusion (cheap) and message count (expensive).
The way it is calculated is just severely messed up, this combined with benchmarking doing weird stuff if you have many parameters. We can chat, if you are interested.
This needs to be fixed soon, at 500 parachain validators on Kusama we see more than 70% weight usage.
@ordian you have been looking into this and related issues already, right?
Looking at https://www.polkadot-weigher.com/history, we see the current block weight numbers for Kusama are around 75%:
A typical block on Kusama has around 500 bitfields (with around 20 candidates included) and around 20 candidates backed in its ParaInherent, which consumes almost all of the weight.
Similarly, Polkadot's weight is around 42% with 300 bitfields and 20 backed candidates per block.
Importing these blocks in practice takes much less time suggesting a benchmarking overhead over 10x.
Looking at the bench numbers for kusama, we have:
type | weight | rocksdb | paritydb |
---|---|---|---|
bitfield | 480us + 27r + 17w | 2.855ms | 1.546ms |
backed | 1409us + 30r + 16w | 3.759ms | 2.449ms |
The weights differ significantly based on whether the weight is set to RocksDB or ParityDB backend (currently, it's rocksdb).
Given that a typical block will contain n_validators
bitfields and n_cores
backed candidates, the bottleneck seems to be the bitfields (we expect 1k n_validators
and 200 n_cores
in the not so distant future). Also as we can see from the db backend difference, the bottleneck is in db ops, not CPU (ParityDB weights are lower than RocksDB's).
Currently, the way we calculate the weight of processing a bitfield and a backed candidate is by running the whole enter
function.
The are 2 main problems with this approach:
A proposed short-term fix is to address problem 1 and switch to ParityDB weights as it should be the default.
The estimated weights with the proposed fix should be (note that the bench weights would remain the same, but we subtract the weight of the empty parainherent when processing enter
):
type | weight | rocksdb | paritydb |
---|---|---|---|
empty | 121us + 15r + 5w | 0.996ms | 0.491ms |
bitfield | 359us + 12r + 12w | 1.859ms | 1.058ms |
backed | 1288us + 15r + 11w | 2.763ms | 1.958ms |
Given these numbers, current Kusama blocks should be estimated to weigh 1.859ms 500 + 20 2.763ms = 0.985s = 49% we shaved off 25% of the total weight With the switch to ParityDB db weights it would be 0.568s = 28% another 21% shaved off
With these (projected) numbers we should be able to scale to 1k validators with 200 (async backing) cores with para inherent weight being 1.5s = 75%
See https://github.com/paritytech/polkadot-sdk/pull/5082 which addresses point 1.
We can optimize this further by addressing the point 2 in a follow-up PR (being worked on).
Instead of running the whole enter
function in order to estimate the cost of processing individual bitfields/backed candidates, it would be better to refactor the code into sections that processes disputes, bitfields and backed candidates and only benchmark specific subsection/function. Luckily such a refactoring was deemed necessary for other reasons and is also being worked on.
We are not blocked on ParaInherent
weight overestimation for scaling to 1k and 200 cores with a simple fix (#5082).
Hi,
I just observed that on kusama the paraInherent weight halfed after we enacted Update Kusama v1.3.0 it went from around 73% before the enactment to around 35% after the enactment, the number of the candidates does not seem to change, so I guess the weights changed.
That's good news, but was that expected ?
Yes, that's expected. See https://github.com/paritytech/polkadot-sdk/pull/5082#issuecomment-2317095079
Take a look at some recent Polkadot blocks. Even empty blocks consume ~500ms of weight.
Benchmarking such blocks shows that only a fraction (~4%) of the weight is actually needed.
That means
paraInclusion
is massively overestimating its weight cost.@shawntabrizi noted that we could think about refunding weight here.
Output from benchmark-block on reference hardware with production profile and wasm executor: