palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
46 stars 7 forks source link

[KeyValueSnapshotReader Phase I] Part 2: ReadValidationCommitTimestampLoader #7017

Closed jeremyk-91 closed 4 months ago

jeremyk-91 commented 4 months ago

General

This PR is part of a series: see the prototype https://github.com/palantir/atlasdb/pull/7000 or the internal RFC for what all of the pieces together are expected to look like. I have tried to separate this feature into reasonably sized components as otherwise it'd probably have a 5k delta or so!

Before this PR: In order to validate that our reads are still correct at commit time, we create a fresh transaction at our commit timestamp. This sneakily has some methods overridden - getCache and getCommitTimestamp. It's very unclear how exactly this might interact with other implementations 😓

After this PR: ==COMMIT_MSG== This PR moves the latter to a separate CommitTimestampLoader, which was introduced as a generic concept in #7016. ==COMMIT_MSG==

Priority: High P2. Nothing super urgent, though part of a high priority workstream.

Concerns / possible downsides (what feedback would you like?):

Is documentation needed?: No

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?: No

Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?: No

The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.): Yes

Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?: No

Does this PR need a schema migration? No

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?: Nothing in particular

What was existing testing like? What have you done to improve it?:

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.: N/A

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?: N/A

Execution

No production changes expected, can be recalled if necessary.

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.: No

Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?: No

Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?: I don't think so?

Development Process

Where should we start reviewing?: Probably the minus diff in SnapshotTransaction, honestly.

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?: It's not

Please tag any other people who should be aware of this PR: @jeremyk-91 @sverma30 @raiju