linkedin / ambry

Distributed object store
https://github.com/linkedin/ambry/wiki
Apache License 2.0
1.74k stars 275 forks source link

[DR][Backup-checker] Disregard message_info timestamps #2824

Closed snalli closed 1 month ago

snalli commented 1 month ago

The backup checker compares a cloud replica with a peer server's replica to detect inconsistencies. The server replica's size grows indefinitely, necessitating a strategy to limit the replication process. Relying on timestamps from the server replica has proven unreliable because they do not consistently increase. This is due to the server sending combined messages that encompass all state transitions of a blob, including the timestamp of the latest state transition. Depending on this timestamp to decide when to stop replication often results in missing blobs.

For instance, consider a sequence of operations: t1 (put-b1), t2 (put-b2), t3 (update-ttl-b1). The server sends replication messages m1 (b1, t3) and m2 (b2) because b2 appears after b1 in the log. If replication is halted at t3, blob b2 would be missed.

To address this issue, we have opted not to rely on timestamps any longer. Instead, we now monitor the progression of a token. If the token does not advance between successive calls to the replicate() function, we decide to terminate the replication process.

The core principle guiding this approach is that our metadata scans of the remote peer server are consistently faster than full data scans. This allows us to advance more rapidly through the log, reducing the risk of indefinite replication.