Closed simerplaha closed 3 years ago
Managing compaction concurrency accounting for available heap size is resulting in designs that would make sacrifices on performance and the quality of Segments files created (small) which is not something we want. We don't want to make sacrifices in our implementation but provide configurations where we can choose our trade-offs.
Our implementation should provide high concurrency for all operations that are CPU bound (like assignment, merge, defragmentation etc) and controlled concurrency for IO bound operations (like reading and creating Segment file for compaction).
Smaller machines should use smaller file sizes and larger machines can use large file sizes.
We should implement a tool internally that would suggest optimal file sizes accounting for available machine resources. For now we should perform heap space inspection via VisualVM or any other JVM monitoring tool.
Files sizes can be set via the following configurations
Suppose the configurations are set to the following
Optimal required heap space for compaction with the above configuration would be at least 1.32 GB
plus extra space to account for object creation.
6 (60.mb + (4 40.mb)) = 1.32 GB
Similar for smaller file sizes with 4.mb for maps and 2.mb for Segments heap space requirement would be at least 72.mb
6 (4.mb + (4 2.mb)) = 72.mb
Problem
Parallel compactions in some cases leads to in-memory bytes arrays being created than available heap space resulting in longer garbage collection time.
Current implementation
Current implementation is aimed at high parallelism. We can concurrently run compactions
Problem with current implementation
The parallelism is not aware of the available heap of space. Large byte-arrays slows down compaction during garbage collection and sometimes leads to memory overflow.
Ideas
Stream
to convert in-memory bytes to Persistent Segments ASAP to release heap memory.