paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 638 forks source link

Shorter availability data retention period for testnets #3270

Open sandreim opened 7 months ago

sandreim commented 7 months ago

Currently we keep the finalized data for 25h. This raises testnet disk space requirements needlessly, especially if huge PoVs are being used.

The way this parameter is passed should have some checks to prevent misuse, especially in non-test chains where we actually do need 25h retention.

eskimor commented 6 months ago

Also for production we should consider only preserving the last 10 blocks or something for each parachain.

pepyakin commented 6 months ago

For production maybe it's worth to err on the safer side though. A parachain could be frozen in case a collator manages to withold the data from other parachain nodes. Therefore 60s sounds a bit too cowboy.

burdges commented 6 months ago

Also for production we should consider only preserving the last 10 blocks or something for each parachain.

If we tune down availability, then some validators will tune down their storage. This matters little for operator's physical nodes, since drives are cheap, but matters lots for cloud validators, since cloud providers charge lots for the very reliable storage they provide. I'm almost always pro-efficency, but in this case maybe better to artificially waste cloud valdiators money since we're not big fans of them. Also, cores are underutilized, so we can tune down this paramater quickly later if we've a burst of usage.

Also, we'll hopefully soon have other availability data which last longer than a few min, like off-chain messages, so we either block that future work on adding multiple availability segments with different retentions, or else we keep some longer reteention period now, and then do multiple availability segments as a later storage optimization.

eskimor commented 6 months ago

For production maybe it's worth to err on the safer side though. A parachain could be frozen in case a collator manages to withold the data from other parachain nodes. Therefore 60s sounds a bit too cowboy.

yeah. I said "or something" does not need to be 10, even 100 would be much less data than what we have now. Either way 10 blocks, should mean 10 different collators, which all would need to collude. I would certainly recommend some proper reasoning before going for a particular number ;-)

burdges commented 6 months ago

We should definitely shortten testnet availability duration anyways. We reduced the validator count & soundness for the community testnet, so they'll be hammered pretty seriously if a bunch of parachains show up. 10 min is definitely reasonable there. I guess 1 min works too, but maybe it'll need bumping for XCMP.

We could discuss the number of sessions for production, but two full sessions would still be less than 24h. We could also reserve a fixed amount of space and recycle old stuff when the space is needed, but this could create strange bugs when people expect it to be longer.

As for duration arguments, we should price storage using commercial drive prices, but not anything fancier like RAID arrays or cloud storage.