Closed justinlin-linkedin closed 6 months ago
Attention: Patch coverage is 86.20690%
with 4 lines
in your changes are missing coverage. Please review.
Project coverage is 67.34%. Comparing base (
52ba813
) to head (d9fe138
). Report is 3 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
...ava/com/github/ambry/store/BlobStoreCompactor.java | 76.47% | 3 Missing and 1 partial :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
When the ignoring peer token happens, we need some visibility, either log or metrics are good for me. I'm approving this PR. Please add the metrics either in this PR or next PR.
Added logs in the new commit
Summary
When a peer went offline for a long time (X days), we should not consider its replication token when checking if a delete tombstone is valid or not.
Reason
The way we remove delete tombstone is fast, it keeps all the remote replication tokens from its peers, and compare offsets from tokens with the offset of the delete tombstone. This approach requires only local information so it's very fast, but it has some issues. When a peer went offline for a long time, it's remote replication token would not move at all. After a couple compactions, this peer's token offset would be less than most of delete tombstones' offsets. Thus all the delete tombstones can't compacted any more.
Change
Filter those offline replicas out. Making sure that we enable the remote replication token persistor so we would persist remote tokens every few hours(or minutes). This would give us a good picture on when was the last time remote peer was active.
If this active timestamp is longer than X days(configurable), we just ignore this peer.
Test
Added unit test