linkedin / ambry

Distributed object store
https://github.com/linkedin/ambry/wiki
Apache License 2.0
1.75k stars 275 forks source link

Filter out all valid log semgents in full range compaction #2773

Closed justinlin-linkedin closed 6 months ago

justinlin-linkedin commented 6 months ago

Summary

In full range compaction (CompactAllPolicy), we would return all log segments what are not in journal to compactor, even if some of the log segments are having 100% of valid data. This is totally unnecessary, especially for leading and trailing log segments. Since we would have to copy all of their data out to a new log segment file.

What is new

Add a configuration to StoreConfig to control this feature

Added a new configuration to StoreConfig to control this feature. This will be turn off by default and we need to change configuration to turn this on. This feature will be first turned on on only a few hosts and make sure it works, then propagate to other hosts

Add a new interface LogSegmentSizeProvider to expose the data size of each log segment.

BlobStoreStats object has a reference to PersistentIndex, which knows the data size of each LogSegment. To best test the new logic, we are creating a new interface LogSegmentSizeProvider and let PersistentIndex implement this interface. In test, we have a map-based mock implementation.

Add a filter function to filter out leading and trailing log segment whose data is 100% valid.

Test

Unit tests

codecov-commenter commented 6 months ago

Codecov Report

Attention: Patch coverage is 5.40541% with 35 lines in your changes are missing coverage. Please review.

Project coverage is 30.08%. Comparing base (52ba813) to head (b3674df). Report is 2 commits behind head on master.

Files Patch % Lines
...om/github/ambry/store/CompactAllPolicyFactory.java 0.00% 30 Missing :warning:
...n/java/com/github/ambry/store/PersistentIndex.java 0.00% 4 Missing :warning:
...in/java/com/github/ambry/store/BlobStoreStats.java 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2773 +/- ## ============================================= - Coverage 64.24% 30.08% -34.16% + Complexity 10398 4104 -6294 ============================================= Files 840 840 Lines 71755 71786 +31 Branches 8611 8615 +4 ============================================= - Hits 46099 21600 -24499 - Misses 23004 48242 +25238 + Partials 2652 1944 -708 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.