justinethier / keyva

:key: A distributed key-value store
MIT License
26 stars 2 forks source link

Compaction strategies #3

Open justinethier opened 2 years ago

justinethier commented 2 years ago

From: https://dev.to/creativcoder/what-is-a-lsm-tree-3d75

A compaction strategy must choose which files to compact, and when to compact them. There are a lot of different compaction strategies that impact the read/write performance and the space utilization in the LSM Tree. Some of the popular compaction strategies that are used in production systems are:

  1. Size-tiered compaction stratey - This is write optimized compaction strategy and is the default one used in ScyllaDB and Cassandra. But it has high space amplification, so other compaction strategies such as leveled compaction were developed.

  2. Leveled compaction strategy - Improves over size-tiered and reduces read amplification as well as space amplification. This is the default in LevelDB.

  3. Time windowed compaction strategy - Used for time series databases. In this strategy, a time window is used post which compaction is triggered. The above are not the only strategies and various systems have developed their own such as the hybrid compaction strategy in ScyllaDB.

Some background on LevelDB:

When the number of files at level 0 reaches a threshold (around 10 files), a compaction is triggered. Compaction will choose a set of overlapping files at level 0 and a set of files they overlap at the next level (level 1) and will merge the data in those files to create a new set of SST files at level 1. The chosen SST files are then discarded.

LevelDB continuously inspects files at each level and triggers compactions when the number of files or the total size at a given level goes beyond a set threshold. LevelDB manages 7 levels of files. The list of current SST files is kept in a MANIFEST file. The id of the current MANIFEST file is stored in the CURRENT file.

justinethier commented 2 years ago

Suggest we define a configuration that allows a program to select any one of these compaction strategies and customize settings as needed. The tree will then be compacted based on this strategy.