paritytech / json-rpc-interface-spec

30 stars 3 forks source link

chainHead: Clarify reported order of pruned blocks #143

Closed lexnv closed 1 week ago

lexnv commented 6 months ago

Substrate chains report pruned blocks from level N (height) when the level N + 1 is finalized.

Case 1

In this case the block 6 (height 3) is finalized. Substrate will try to prune blocks on height 2, and will not report any blocks (because block 2 is not the leaf of fork).

However, we know conceptually that [block 2, block 3] and [block 2, block 4, block 5] are blocks on forks that will never get finalized.

     finalized -> block 1 -> block 2 -> block 3

                          -> block 2 -> block 4 -> block 5

               -> block 1 -> block 2'-> block 6 

Case 2

In this case the block 7 (height 5) is finalized. Substrate will try to prune blocks on height 3, in this case block 3 (height 3) is a leaf of a fork. Block 2 and block 3 are reported as pruned with the finalized event. However, [block 2, block 4, block 5] are not reported yet.

     finalized -> block 1 -> block 2 -> block 3

                          -> block 2 -> block 4 -> block 5

               -> block 1 -> block 2'-> block 6 -> block 7

Case 3

In this case the block 8 (height 6) is finalized. Substrate will try to prune blocks on height 4, in this case block 5 (height 4) is a leaf of a fork. Block 2, block 4 and block 5 are on a fork. However, only block 4 and block 5 should be reported. This is because we reported block 2 in the previous step.

     finalized -> block 1 -> block 2 -> block 3

                          -> block 2 -> block 4 -> block 5

               -> block 1 -> block 2'-> block 6 -> block 7 -> block 8

Questions

Do you see a problem with delaying the announcement of the branch as pruned? (for example, case 1 and case 2).

Should we enforce the order of the pruned blocks? Meaning, for case 1, should we declare at this step block 2, block 3, block 4 and block 5 as pruned?

What would the lightclient do in those scenarios? (cc @tomaka) 🙏 Thanks @josepot for raising this in https://github.com/paritytech/json-rpc-interface-spec/issues/142.

cc Substrate fix to report unique branches as pruned: https://github.com/paritytech/polkadot-sdk/pull/3667 cc @paritytech/subxt-team

josepot commented 6 months ago

Substrate will try to prune blocks on height 2

Probably because since those blocks are the root nodes of the pruned branches, then it's trivial to get all the other ones that are pruned... no? I mean, from those ones you just need to traverse their descendants to find the other pruned blocks, correct? If you want, we could change the spec so that only the roots of the pruned branches are reported... I don't care, really.

Do you see a problem with delaying the announcement of the branch as pruned?

I mean, yes and no.

Basically, in polkadot-api we are currently relying on the prunnedBlocks property because it's convenient, but it would be relatively easy for us to detect which block must be pruned every time that we receive a finalized event. So, if the spec allowed reporting the pruned blocks in a "delayed fashion", then we would stop relying on that property and we would compute those blocks ourselves (it's pretty much trivial for us to do so TBH).

In other words, the prunedBlocks blocks property is nice-to-have sugar. If we can't have that info in a timely manner, then I rather to just remove the property.

Should we enforce the order of the pruned blocks? Meaning, for case 1, should we declare at this step block 2, block 3, block 4 and block 5 as pruned?

IMO the order is completely irrelevant when dealing with pruned blocks.

tomaka commented 6 months ago

I fail to understand anything about the example cases and schemas.

To me, delaying the moment when the finalization is reported is acceptable. Delaying when pruned blocks are reported is not acceptable. The list of pruned blocks is not a list of "blocks removed from the database" or something like that, which I guess is what Substrate reports to you, but a list of blocks that are now no longer part of the finalized chain. It doesn't make sense for a block to be reported later as no longer part of the finalized chain when it was already the case at the previous event.

jsdw commented 6 months ago

In other words, the prunedBlocks blocks property is nice-to-have sugar. If we can't have that info in a timely manner, then I rather to just remove the property.

Just to note that in Subxt the prunedBlocks list is handy to have, because we delay the unpinning of blocks until they show up as being either finalized or pruned (else we might unpin a new block and then it shows up as best or finalized and suddenly we are interested in it again)

We don't care about the order of blocks mentioned in the pruned blocks list though.

lexnv commented 1 week ago

After the latest changes from substrate pruning logic https://github.com/paritytech/polkadot-sdk/pull/3962, the substrate implementation is now returning all pruned blocks in one go.

Previously, the pruning logic for this small fork would result in:

b1 -> b2 -> b3
     -> b2' (f) -> b3' (f)

Before: pruning logic was reporting blocks on height N-1, where N is the height of the finalized block. This led to us reporting no pruned blocks B2' was finalized. After: We are now reporting in one go b2 and b3 as pruned when we finalized B2'.