storage: trigger compaction when the deleted key / dropped table ratio is high

hzxa21 commented 1 year ago

Currently space of deleted keys and dropped tables is reclaimed on compaction but the dropped/deleted ratio is not considered in the compaction trigger. This can cause prolonged space reclaim especially when writes are rare.

One way to solve this issue is to add a new picker for deleted key / dropped table ratio.

This is not urgent but should be done.

hzxa21 commented 1 year ago

related: #6765

hzxa21 commented 1 year ago

cc @Little-Wallace @soundOfDestiny @Li0k

hzxa21 commented 1 year ago

@Li0k Feel free to reassign if needed.

Li0k commented 1 year ago

Motivation

There are two general purposes of delete ratio compaction

performance
space reclaim

For performance, we need to trigger compaction in time to reduce multi-version keys and delete-key to improve read performance. Therefore, we need to introduce the factor of the delete ratio into the original calculation rules.

The purpose of space reclaim is to collect and process the corresponding files and ensure that the space can be released in time to reduce storage costs. The conventional approach is to calculate and record the delete ratio in sst and increase its compaction priority to ensure that compaction can be prioritized, but this approach is complicated in the context of TTL and drop table. And the real-time requirements are lowered compared with the former, so it is expected to periodically rewrite files with a higher delete ratio.

Implementation

According to the above analysis, it is only necessary to trigger a new type of compaction task that rewrites the related file, and the goal can be achieved. In order to ensure correctness, we only need to deal with two cases, cleaning up the drop table and the data that has expired (TTL). Both types of data are no longer accessed in our system.

Based on the current compaction implementation, we introduce a new Task, which only scans files and cleans up the above useless data (without data merging), and rewrites this sst to achieve the purpose of space release. Here are a few restrictions

Executed only during low-load periods
Periodic execution
Executed only for bottommost level (contains 90% of data)
Select the specified number of files to execute
Introduce a new pending_type to reduce blocking on the original compaction
Ordinary compaction can replace

Little-Wallace commented 1 year ago

We can split sstable files by table-id in the bottommost level so that there would be at most one table-id in each of sst. And then we can check whether this table need to expire. We can record minimal epoch in each of sst

hzxa21 commented 1 year ago

We can split sstable files by table-id in the bottommost level so that there would be at most one table-id in each of sst. And then we can check whether this table need to expire. We can record minimal epoch in each of sst

Good idea!

soundOfDestiny commented 1 year ago

We can split sstable files by table-id in the bottommost level so that there would be at most one table-id in each of sst. And then we can check whether this table need to expire. We can record minimal epoch in each of sst

I prefer this scheme

Li0k commented 1 year ago

Some optional optimizations

Make regular Compaction tasks a higher priority than Space Clean tasks. When a regular task is triggered, conflicting Space clean tasks will be ignored to ensure that normal writes are not blocked
Generate sst files with table_id as a boundary in the bottommost level. In the target scenario, the attributes of TTL and drop_table belong to a specific table, not sst-file. Therefore, the split sst can achieve a better filter and reduce write amplification during rewriting

Li0k commented 1 year ago

We have implemented a drop table and ttl related cleanup strategy. The delete key ratio compaction is not a scenario we have encountered so far, and we have lowered its priority.

risingwavelabs / risingwave