A Stream like approach to Compaction

Problem

Parallel compactions in some cases leads to in-memory bytes arrays being created than available heap space resulting in longer garbage collection time.

Current implementation

Current implementation is aimed at high parallelism. We can concurrently run compactions

On each Level (LevelZero and lower Levels)
On each Segment within each Level
On each group of key-values within each Segment
On creation of each index within each Segment (parallel linear-index, hash-indexes, binary-search-indexes & bloom-filters)

Problem with current implementation

The parallelism is not aware of the available heap of space. Large byte-arrays slows down compaction during garbage collection and sometimes leads to memory overflow.

Ideas

Persist Segments as merge is in progress (partially complete) to release heap space. This is the current approach per thread but is unaware of parallel threads.
Control concurrency accounting for available heap space.
Use Stream to convert in-memory bytes to Persistent Segments ASAP to release heap memory.
???

Managing compaction concurrency accounting for available heap size is resulting in designs that would make sacrifices on performance and the quality of Segments files created (small) which is not something we want. We don't want to make sacrifices in our implementation but provide configurations where we can choose our trade-offs.

Our implementation should provide high concurrency for all operations that are CPU bound (like assignment, merge, defragmentation etc) and controlled concurrency for IO bound operations (like reading and creating Segment file for compaction).

Solution

Smaller machines should use smaller file sizes and larger machines can use large file sizes.

How to calculate optimal file sizes?

We should implement a tool internally that would suggest optimal file sizes accounting for available machine resources. For now we should perform heap space inspection via VisualVM or any other JVM monitoring tool.

Files sizes can be set via the following configurations

Segment files - minSegmentSize
WAL/Map files - mapSize

Example use-case:

Suppose the configurations are set to the following

mapSize == 60.mb
minSegmentSize == 40.mb
throttle for LevelZero is set to trigger compaction where there are >= 4 WAL/map/log files
compaction's ExecutionContext has 6 threads.

Optimal required heap space for compaction with the above configuration would be at least 1.32 GB plus extra space to account for object creation.

6 (60.mb + (4 40.mb)) = 1.32 GB

Similar for smaller file sizes with 4.mb for maps and 2.mb for Segments heap space requirement would be at least 72.mb

6 (4.mb + (4 2.mb)) = 72.mb

simerplaha / SwayDB