rotationalio / honu

Adaptive consistency replication with reinforcement learning for large scale globally distributed storage.
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Update Version Checks #18

Closed bbengfort closed 2 years ago

bbengfort commented 2 years ago

This PR implements a required change for Trtl Anti-Entropy: Update checks if the version being updated is still later than the local version. I've also added a WithForce() option that bypasses checks if the user wants to force put an object to the database, overriding the namespace and version checks.

This is related to https://github.com/trisacrypto/directory/pull/299

Note this is related to sc-2703 but is not a complete implementation; this PR only does enough to fix sc-2694.

codecov[bot] commented 2 years ago

Codecov Report

Merging #18 (8f39d82) into main (cc53387) will decrease coverage by 0.07%. The diff coverage is 61.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
- Coverage   55.08%   55.00%   -0.08%     
==========================================
  Files          10       10              
  Lines         492      509      +17     
==========================================
+ Hits          271      280       +9     
- Misses        178      184       +6     
- Partials       43       45       +2     
Impacted Files Coverage Δ
options/options.go 77.77% <0.00%> (-13.53%) :arrow_down:
honu.go 48.20% <76.47%> (+2.16%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cc53387...8f39d82. Read the comment docs.

bbengfort commented 2 years ago

I know we haven't gone through sc-2694 yet @bbengfort @pdeziel, but I think this makes sense to me. I gather that part of the replication bug we've been experiencing in trtl relates to the need for this subsequent check that object updates sent by the remote are still the latest from the perspective of the local replica? And it looks like you've added the force option so that we can still make manual repairs to an object, even if it means regressing its version?

Yes, this is related to the need to handle Put and Update concurrency correctly while CHECK messages are being sent between replicas. Adding guards like this makes me a little worried sometimes though because it means if things get out of whack, it's tougher to repair - so I'm hoping the force option allows us to do dangerous things when we just need the database to get back to a correct state.

Based on your review of SC-2694, I think we should add one more thing to Update - a return of the UpdateType, e.g. was it a stomp, a skip, a linear version, etc. I'd like to open that in a new PR so that it can be added to SC-2694 to unblock the prometheus metrics.

shortcut-integration[bot] commented 2 years ago

This pull request has been linked to Shortcut Story #2694: Refactor Phase 1/Phase 2 Anti-Entropy and Gossip.

rebeccabilbro commented 2 years ago

Based on your review of SC-2694, I think we should add one more thing to Update - a return of the UpdateType, e.g. was it a stomp, a skip, a linear version, etc. I'd like to open that in a new PR so that it can be added to SC-2694 to unblock the prometheus metrics.

That would be great!