paritytech / parity-db

Experimental blockchain database
Apache License 2.0
263 stars 59 forks source link

Upper bound on database size #205

Closed nazar-pc closed 1 year ago

nazar-pc commented 1 year ago

I'm wondering if there is a way to estimate upper bound for database size.

Let's say if we are storing values of the same size, is there a way to predict how many keys we can insert/update to never exceed X mount of space on disk?

At least having worse-case scenario would be nice.

arkpar commented 1 year ago

Depends on the column configuration. For default (hash) columnsN values V byte each shoud not take no more than N 32 + N (V+32) 1.03 + (16 + 256 +8) 1024 bytes at rest. Some additional space (up to 1Gb) may be required for the write-ahead log. This is assuming V < 32724 bytes and compression is disabled. This is probabilistic however. The first component here: (N 32) is the hash table index size. Each entry in the index takes 8 bytes. If there's a collision in one of the buckets, a reindex process is triggered, which creates a new index file twice as big, and moves all data there from the old file. Usually this happens when the index is about 50% full. (N 32) is a safe estimation with the index 25% full.

nazar-pc commented 1 year ago

50% full is relatively to what?

arkpar commented 1 year ago

Relative to the index capacity. Index starts at 16 Mbytes with the capacity for 256k entries, 8 byte each. Once it is about 50% full (128k entries), collsion happens. Capacity (and file size) doubles.

nazar-pc commented 1 year ago

So if I add/delete entries, but don't exceed 128k entries, it shouldn't double, is that correct assumption, or deleted entries cause some complications?

arkpar commented 1 year ago

with 128k entries there's a high chance it will double, but with 64k the chance is low enough. That's why I'm suggesting etimating index size at 25% occupancy. I.e. index size is 25% full taking N*4 entries, 8 byte each. Deleting entries is fine, as deletions are applied immediatelly. Deleting won't ever decrease file size. But as long as you are deleting as much as inserting, it should be fine.

nazar-pc commented 1 year ago

Thanks for those details, very helpful!