paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 635 forks source link

Radix-16 issue: Storage proof size ballooning from adding new genesis accounts #1498

Closed Sophia-Gold closed 1 week ago

Sophia-Gold commented 1 year ago

In running sTPS for one parachain after async backing, I've tried generating many more genesis accounts (500k, vs. ~16k used previously) in order to leave it running for several minutes and take the average TPS. This has resulted in a large increase in storage proof size from ~800kb to ~2.1mb despite blocks only including half as many extrinsics. This persists despite bumping the parachain runtime state version to 1.

To replicate:

ggwpez commented 1 year ago

The proof for each element grows with the number of nodes in the trie. AFAIK it should be a log growth for the number of accounts.
What is the average PoV per TX for both configurations? If the increase is about 500 byte, then it should surely be because of that. Maybe a bit less than 500 because of the node reusing in the proof.
Also if the transfer ie endowing, then the PoV should increase with time.

Sophia-Gold commented 1 year ago

Yes, it's about a 500 byte increase in compressed PoV per extrinsic. You're saying this is not a bug?

It's still using less than half the max PoV size in my example, so I can likely reach the collator's authorship duration by removing some check or another.

My question would just be whether the larger storage proofs are affecting performance in any way, which I can test. It at least raises questions when translating this into a multi-parachain network using synthetic load from gluttons. If I'm approaching the max PoV size for a single parachain only because I've generated tons of genesis accounts to measure sustained load, is it fair to have the gluttons use a PoV size equal to if I had added only one full block's worth of accounts? I suppose so.

burdges commented 1 year ago

It's only log size, but we've this stupid radix 16 trie from ETH, which quadruples the size of dense PoVs. And non-dense PoVs should be worse than dense ones anyways.

An extra 500 bytes per tx should represent almost 50,000 times as many entries, but it only represents 15 times as many entries because our trie is stupid, like I've been telling everyone for maybe 4 years now.

I guess Sophia gets a 30 fold increase by paying only 500 bytes per tx because she halved the extrinsics count.

rphmeier commented 1 year ago

yes, it's been known for years that radix-16 is a much worse choice than radix-2 for stateless-client proofs (I remember Martin Becze mentioning this in 2017). Substrate is designed to support an arbitrary storage backend, just nobody has picked this up yet.

eskimor commented 7 months ago

Relevant article: https://www.rob.tech/blog/nearly-optimal-polkadot-merklization/

Polkadot-Forum commented 5 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/overcoming-sea-network-shaping/7401/1