palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
46 stars 7 forks source link

improvement: Resolve flakes in AMNPTLSIT (AbstractMultiNodePaxosTimeLockServerIntegrationTest) #7002

Closed jeremyk-91 closed 4 months ago

jeremyk-91 commented 4 months ago

General

Before this PR: Two tests in AMNPTLSIT were reported as flaky.

After this PR:

==COMMIT_MSG== One of the flaky tests has been removed as its behaviour (the leader services requests) is indirectly tested by many other nodes. The other was changed to handle a leader election. Generally these 2 flakes relate to check then act problems for which node of timelock the leader is. ==COMMIT_MSG==

Priority: p2

Concerns / possible downsides (what feedback would you like?): Nothing much

Is documentation needed?: No

Compatibility

Only a test change.

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?: That we have tests that check the timelock leader can serve requests.

What was existing testing like? What have you done to improve it?: These should be less flaky

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.: N/A

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?: N/A

Execution

Only a test change

Scale

Only a test change

Development Process

Where should we start reviewing?: It's small

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?: N/A

Please tag any other people who should be aware of this PR: @jeremyk-91 @sverma30 @raiju