Open hzxa21 opened 1 year ago
related: #6765
cc @Little-Wallace @soundOfDestiny @Li0k
@Li0k Feel free to reassign if needed.
There are two general purposes of delete ratio compaction
For performance, we need to trigger compaction in time to reduce multi-version keys and delete-key to improve read performance. Therefore, we need to introduce the factor of the delete ratio into the original calculation rules.
The purpose of space reclaim is to collect and process the corresponding files and ensure that the space can be released in time to reduce storage costs. The conventional approach is to calculate and record the delete ratio in sst and increase its compaction priority to ensure that compaction can be prioritized, but this approach is complicated in the context of TTL and drop table. And the real-time requirements are lowered compared with the former, so it is expected to periodically rewrite files with a higher delete ratio.
According to the above analysis, it is only necessary to trigger a new type of compaction task that rewrites the related file, and the goal can be achieved. In order to ensure correctness, we only need to deal with two cases, cleaning up the drop table and the data that has expired (TTL). Both types of data are no longer accessed in our system.
Based on the current compaction implementation, we introduce a new Task, which only scans files and cleans up the above useless data (without data merging), and rewrites this sst to achieve the purpose of space release. Here are a few restrictions
We can split sstable files by table-id in the bottommost level so that there would be at most one table-id in each of sst. And then we can check whether this table need to expire. We can record minimal epoch in each of sst
We can split sstable files by table-id in the bottommost level so that there would be at most one table-id in each of sst. And then we can check whether this table need to expire. We can record minimal epoch in each of sst
Good idea!
We can split sstable files by table-id in the bottommost level so that there would be at most one table-id in each of sst. And then we can check whether this table need to expire. We can record minimal epoch in each of sst
I prefer this scheme
We have implemented a drop table and ttl related cleanup strategy. The delete key ratio compaction is not a scenario we have encountered so far, and we have lowered its priority.
Currently space of deleted keys and dropped tables is reclaimed on compaction but the dropped/deleted ratio is not considered in the compaction trigger. This can cause prolonged space reclaim especially when writes are rare.
One way to solve this issue is to add a new picker for deleted key / dropped table ratio.
This is not urgent but should be done.