mongodb / mongo-rust-driver

The official MongoDB Rust Driver
https://www.mongodb.com/docs/drivers/rust/current/
Apache License 2.0
1.44k stars 164 forks source link

RUST-2046 Fix flaky afterClusterTime test #1209

Closed abr-egn closed 1 month ago

abr-egn commented 1 month ago

RUST-2046

Rewriting the spec test as a prose test turned out to be very useful and definitely a technique I'm keeping in my pocket for the future; once I had that done it was very quick to narrow the problem down.

Caveat: the following is my theory of what's going on. The fix works but I'm less than 100% confident in my knowledge of the behavior of mongodb deployments in complex situations like this.

The core issue is the behavior of the snapshot read concern:

A query with read concern snapshot returns majority-committed data as it appears across shards from a specific single point in time in the recent past.

There are no particular guarantees about it being the most recent (if you want that, use majority) or "at least after the last transaction" (if you want that, use causally consistent sessions). In the case of this particular test, it's not even done as part of a transaction or session, it's just a find (which is allowed, that's one of the three read operations that support it outside of transactions).

In practice, what seems to happen is you get the timestamp of the last write acknowledged by the server you happen to be connected to. Where this goes wrong for this particular test:

The fix is to use a secondary internal client that's also pinned to the first mongos for initial data population, which avoids the first domino in the chain of failure. Hypothetically I could have updated the unified runner to make the internal client always be pinned but that seemed to be much more likely to cause surprising and unwanted behavior elsewhere.

Sidebar: if my understanding of the meaning of snapshot is correct, this test relies on an awful lot of implicit behavior both serverside and in driver test runners.

abr-egn commented 1 month ago

I was thinking this had a chance at being the first natural green in a while but it looks like orchestration's broken on mac, ah well :)