yahoo / HaloDB

A fast, log structured key-value store.
https://yahoodevelopers.tumblr.com/post/178250134648/introducing-halodb-a-fast-embedded-key-value
Apache License 2.0
507 stars 99 forks source link

Data Compression #20

Open JamieCressey opened 5 years ago

JamieCressey commented 5 years ago

LZ4 is included as a dependancy, but it doesn't look to be used. Is there a reason for this?

Would you be open to a PR that optionally enables LZ4 compression of keys and/or values?

amannaly commented 5 years ago

LZ4 is actually a dependency of OHC, a modified version of which we use for managing the in-memory index. We don't actually need LZ4 and we probably should remove it.

Are you proposing compressing individual keys and values before they are written out to to disk? This is something that we currently do on the client side; we compress all values using Snappy before calling put(key, value). Therefore, I didn't add support for compression to HaloDB, and probably leaving compression of individual keys-value pairs to the client is the simplest and the most flexible approach.

If you are referring to compressing bigger blocks of data within HaloDB, say at the block level, then this is probably a big change.

scottcarey commented 5 years ago

What about compressing the .index and tombstone files? If I understand correctly, these are not actively read by queries, sequentially during compaction and initialization. Index data is a large space overhead versus other stores I'm testing -- since the key data is in both the index and the data file.

Compressing the index with say, ZStandard compression at a low-ish level (which can compress in the 300MB/sec range and decompress in the 100MB/sec+ range would likely be a performance win and space saving win.

If I'm wrong, and index / tombstone files are accessed randomly then it would probably be too big a change to make it worth it.

As for compressing values implicitly, the problem with the client being responsible for this is that it is much harder to use something like zstd dictionary compression. HaloDB can sample values during compaction, and generate a dictionary on the fly that is optimal for the data being compacted. The user has no way to know when compaction is happening.

wangtao724 commented 5 years ago

Comparing to data files (1 GB per file based on default config), index and tombstone files are not major factor of disk space consuming. To maximize performance of DB opening, we would recommend not doing index and tombstone file compression.

scottcarey commented 5 years ago

index and tombstone files are not major factor of disk space consuming.

Ok, well I've got one data set right here with a 870MB Index files and a 904MB of data files. (keys = variable length strings, values= 8 byte long).

I have several data sets where the index file is close to 1/3 the size of the data, e.g. mapping UUID keys to UUID values.

To maximize performance of DB opening, we would recommend not doing index and tombstone file compression.

I would be surprised if it would change the time by more than 10%, and not surprised if it was faster if the disk is busy. Roughly speaking, size will be between 25% and 65% the original index size with zstandard, with proportionately less I/O. CPU use when decompressing is low, a single CPU should decompress at close to 1GB/sec (output speed).

I just tested compression with zstandard on two of my index files from different data sets. One compressed to 39% the original size, the other to 29% the original size. Decompressing was close to 1000MB/sec on an old processor: Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz (boosts to 3.5ghz).

In some of my larger data sets, compressing indexes to a 3:1 ratio would save me between 10% and 33% disk space.

1 GB per file based on default config The default is 1MB (https://github.com/yahoo/HaloDB/blob/master/src/main/java/com/oath/halodb/HaloDBOptions.java#L15) Though this doesn't affect the index/data ratio.

wangtao724 commented 5 years ago

@scottcarey Thank you for the valuable feedbacks. Forgive me if my previous statement is not accurate. What I meant is absolute disk space consuming of index and tombstone files. In index file each entry length is header length (22 bytes) + key length (vary from 1 to 127). In our production, our key length is 8 bytes. For each DB instance, we store around 400 million records. The index files size is (22+8) * 400M = 12GB. I think it is not a big deal for the servers in nowadays.

Specifically for your case, what I can say is each DB engine has its tradeoffs. We made this kind of tradeoffs based on our use case. In other words, it may not suitable for your use case.

Thanks again for your inputs

scottcarey commented 5 years ago

Would you accept a new feature that helps other use cases if it doesn't help yours that much?

FWIW, HaloDB is killing every other DB engine except in:

  1. disk usage -- on data sets with larger keys due to index size.
  2. any situation where keys need to be larger than 127 bytes. The limit seems completely artificial and arbitrary. (ok, perhaps some native allocation would need to be slightly trickier).
  3. when keys can't fit in memory.

The first two seem fairly solvable without radical changes.

wangtao724 commented 5 years ago

You are welcome to make contributions. Could you make the change to support compression of index and tombstone files? We can help you do code review. For this change, please use an option to control whether do compression or not, default no compression.

About key length, we limit it to 127 bytes because we want to minimize the memory usage of indexing. Currently, 1 byte is reserved for key size in IndexFileEntry.java. It is client's responsibility to control the key length within this limitation.

scottcarey commented 4 years ago

@wangtao724 After more in depth review of the code I have several optimizations I'll discuss in other issues.