Rewriting the spec test as a prose test turned out to be very useful and definitely a technique I'm keeping in my pocket for the future; once I had that done it was very quick to narrow the problem down.
Caveat: the following is my theory of what's going on. The fix works but I'm less than 100% confident in my knowledge of the behavior of mongodb deployments in complex situations like this.
The core issue is the behavior of the snapshot read concern:
A query with read concern snapshot returns majority-committed data as it appears across shards from a specific single point in time in the recent past.
There are no particular guarantees about it being the most recent (if you want that, use majority) or "at least after the last transaction" (if you want that, use causally consistent sessions). In the case of this particular test, it's not even done as part of a transaction or session, it's just a find (which is allowed, that's one of the three read operations that support it outside of transactions).
In practice, what seems to happen is you get the timestamp of the last write acknowledged by the server you happen to be connected to. Where this goes wrong for this particular test:
The test client entity has useMultipleMongoses: false, so it'll always be connecting to the first (of two) configured mongoses.
The initial data is populated using the internal client, which is just a normal client, connecting to wherever server selection happens to fall.
Initial data write uses majority write concern; in this case, I think the "calculated majority" will simply be 1.
So, if...
the internal client picks the second mongos
and the write isn't acknowledged by the first mongos
and it hasn't replicated by the time the test runs
then the timestamp chosen for the snapshot read concern in the test will be before the timestamp for the write, and the test find will return an empty list, causing our flake.
The fix is to use a secondary internal client that's also pinned to the first mongos for initial data population, which avoids the first domino in the chain of failure. Hypothetically I could have updated the unified runner to make the internal client always be pinned but that seemed to be much more likely to cause surprising and unwanted behavior elsewhere.
Sidebar: if my understanding of the meaning of snapshot is correct, this test relies on an awful lot of implicit behavior both serverside and in driver test runners.
RUST-2046
Rewriting the spec test as a prose test turned out to be very useful and definitely a technique I'm keeping in my pocket for the future; once I had that done it was very quick to narrow the problem down.
Caveat: the following is my theory of what's going on. The fix works but I'm less than 100% confident in my knowledge of the behavior of mongodb deployments in complex situations like this.
The core issue is the behavior of the
snapshot
read concern:There are no particular guarantees about it being the most recent (if you want that, use
majority
) or "at least after the last transaction" (if you want that, use causally consistent sessions). In the case of this particular test, it's not even done as part of a transaction or session, it's just afind
(which is allowed, that's one of the three read operations that support it outside of transactions).In practice, what seems to happen is you get the timestamp of the last write acknowledged by the server you happen to be connected to. Where this goes wrong for this particular test:
useMultipleMongoses: false
, so it'll always be connecting to the first (of two) configured mongoses.majority
write concern; in this case, I think the "calculated majority" will simply be1
.snapshot
read concern in the test will be before the timestamp for the write, and the testfind
will return an empty list, causing our flake.The fix is to use a secondary internal client that's also pinned to the first mongos for initial data population, which avoids the first domino in the chain of failure. Hypothetically I could have updated the unified runner to make the internal client always be pinned but that seemed to be much more likely to cause surprising and unwanted behavior elsewhere.
Sidebar: if my understanding of the meaning of
snapshot
is correct, this test relies on an awful lot of implicit behavior both serverside and in driver test runners.