simerplaha / SwayDB

Persistent and in-memory key-value storage engine for JVM that scales on a single machine.
https://swaydb.simer.au
Apache License 2.0
293 stars 16 forks source link

A Stream like approach to Compaction #301

Closed simerplaha closed 3 years ago

simerplaha commented 3 years ago

Problem

Parallel compactions in some cases leads to in-memory bytes arrays being created than available heap space resulting in longer garbage collection time.

Current implementation

Current implementation is aimed at high parallelism. We can concurrently run compactions

Problem with current implementation

The parallelism is not aware of the available heap of space. Large byte-arrays slows down compaction during garbage collection and sometimes leads to memory overflow.

Ideas

simerplaha commented 3 years ago

Managing compaction concurrency accounting for available heap size is resulting in designs that would make sacrifices on performance and the quality of Segments files created (small) which is not something we want. We don't want to make sacrifices in our implementation but provide configurations where we can choose our trade-offs.

Our implementation should provide high concurrency for all operations that are CPU bound (like assignment, merge, defragmentation etc) and controlled concurrency for IO bound operations (like reading and creating Segment file for compaction).

Solution

Smaller machines should use smaller file sizes and larger machines can use large file sizes.

How to calculate optimal file sizes?

We should implement a tool internally that would suggest optimal file sizes accounting for available machine resources. For now we should perform heap space inspection via VisualVM or any other JVM monitoring tool.

Files sizes can be set via the following configurations

Example use-case:

Suppose the configurations are set to the following

Optimal required heap space for compaction with the above configuration would be at least 1.32 GB plus extra space to account for object creation.

6 (60.mb + (4 40.mb)) = 1.32 GB

Similar for smaller file sizes with 4.mb for maps and 2.mb for Segments heap space requirement would be at least 72.mb

6 (4.mb + (4 2.mb)) = 72.mb