paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.76k stars 631 forks source link

Gap Synced blocks are kept even though they are out of pruning window #5119

Open skunert opened 1 month ago

skunert commented 1 month ago

While reviewing https://github.com/paritytech/polkadot-sdk/pull/5103 I was wondering how we handle gap synced blocks together with a block pruning setting.

After warp sync reached the top and we have downloaded the state, the tip of the chain continues with normal sync operations. In the meantime, the gap between genesis and the warp target block is being downloaded from the other peers. We issue the peer gap block requests here: https://github.com/paritytech/polkadot-sdk/blob/2bd187f85b57ccbcddc1041dacea4be80a901edd/substrate/client/network/sync/src/strategy/chain_sync.rs#L2042-L2043

Attributes to fetch are defined here and show that we fetch bodies too: https://github.com/paritytech/polkadot-sdk/blob/2bd187f85b57ccbcddc1041dacea4be80a901edd/substrate/client/network/sync/src/strategy/chain_sync.rs#L1203-L1215

However the pruning of block bodies only happens by pruning the current finalized block - pruning window. https://github.com/paritytech/polkadot-sdk/blob/2bd187f85b57ccbcddc1041dacea4be80a901edd/substrate/client/db/src/lib.rs#L1826-L1827

So my suspicion is that the gap blocks are currently all kept (but should be verified). Another question for me currently is why we even download block bodies for gap sync.

tmpolaczyk commented 1 month ago

So my suspicion is that the gap blocks are currently all kept

That's correct.

Another question for me currently is why we even download block bodies for gap sync.

There is some related discussion in https://github.com/paritytech/polkadot-sdk/pull/2710, where I tried to add a flag to disable the block history download, but then I realized that simply enabling block prunning should already skip the block download. But that doesn't work.

I would be happy if this is fixed, warp sync would use much fewer resources if it didn't have to download the block history when it is not needed.

lexnv commented 1 month ago

cc https://github.com/paritytech/polkadot-sdk/issues/2738

nazar-pc commented 1 month ago

I'd love to see gap sync decoupled somehow as in our blockchain we don't even attempt to sync the gap at all and had to prevent gap sync with a hack: https://github.com/subspace/polkadot-sdk/commit/799ca7c0dbca8ffc8133e6548f14c74bd581375f

Ideally gap sync would not be initiated at all, but right now Substrate assumes gap means warp sync, which is not necessarily true for projects that do much deeper customizations than parachains typically do.

bkchr commented 1 month ago

I'd love to see gap sync decoupled somehow as in our blockchain we don't even attempt to sync the gap

Yeah. We need to improve this. We probably need to keep the Gap in the db. However, we should be more smart when it comes to closing this Gap.