scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

RFE: compress backups with gzip - with Huffman encoding only #3660

Open mykaul opened 9 months ago

mykaul commented 9 months ago

Backups consume space - which is costly. It also takes time to upload (and certainly to download) Regretfully, rclone only support gzip[1] (and not zstd) for compression. This is somewhat unfortunate. Luckily, it seem decompression is inline and straightforward.

I suggest we try it with Huffman encoding[2] only, for 2 reasons:

  1. It's the fastest, least memory consuming compression.
  2. Most, if not all, sstables are already compressed with LZ4, which is OK, but lacks entropy encoding. We've seen cases (example, JSON payload as content) where it could help with 20-30% better compression.

[1] https://rclone.org/compress/ [2] https://rclone.org/compress/#compress-level

tzach commented 9 months ago

Good idea. Lets start with a quick PoC, even out side the Manager (just with rclone) to understand the impact on stored sstable size, time it taks to upload/download a backup.