yahoo / HaloDB

A fast, log structured key-value store.
https://yahoodevelopers.tumblr.com/post/178250134648/introducing-halodb-a-fast-embedded-key-value
Apache License 2.0
508 stars 100 forks source link

Supporting keys > 127 bytes long #42

Open scottcarey opened 5 years ago

scottcarey commented 5 years ago

HaloDB encodes key length as a signed byte on disk, and in memory. It rejects keys longer than 127 bytes.

Although most of my string keys in different databases are between 10 and 80 bytes, I have some rare outliers as large as 780 bytes. Every other K/V store I work with (I have an API abstraction that wraps about 10) can support large keys; only one other breaks at 512 bytes.

There are some options for longer keys:

Easy: read the byte as an unsigned value, and then sizes from 0 to 255 are supported. However, this will make it even harder to support larger keys.

Harder: Allow for larger sizes, perhaps up to 2KB.

scottcarey commented 4 years ago

I have a PR available now (PR #49)

The solution I came to for the SegmentWithMemoryPool differs from my ideas above. I used the existing chain structure within the slots inside the memory pool to store additional 'overflow' key data.

An alternate solution might be to 'branch' the chain so that the additional key data is in a branch off of the main linked list, instead of inlined into it. This however, requires that 'fixedKeyLength' be at least 4, or else there will be no room in the pool slot for the additional pointer.

The drawback is that if significantly large keys (> 66 bytes) are in use at the same time as there is a large load factor, lookups will have to regularly skip past existing keys that take up 3 or more links in the chain, slowing lookups down. But if load factors are 'normal' -- below 0.75 -- it probably won't make much of a difference. At load factor 0.75, the average chain length of a filled table is about 1.5, and so about half the lookups will have an additional key to skip over when searching, and half the time it will be the second key, so about 25% of the time a successful lookup will need to follow 2 more linked list links. Theoretically this is about a 12% increase in slots scanned, at the cost of not being able to support large keys unless fixedKeySize is at least 4.

So I think that is open to discussion. I could also attempt such a variant, and see how much better it performs on my large key workload. But that will take some time to build. (I have one from a 7.2GB data set using 60 to 600 byte keys -- most under 90 bytes, that I've been running benchmarks with -- my other datasets are 8 to 12 byte keys and 8.3GB and one with 2.1GB with 4 to 70 byte keys, mostly less than 20).

The overall performance is about 3% (non pooled) to 6% (pooled) better than before for query throughput with my data sets.