palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
45 stars 7 forks source link

[Flakes] TimeLockMigrationEteTest #7037

Closed jeremyk-91 closed 3 months ago

jeremyk-91 commented 3 months ago

General

Before this PR: The timelock migration ETE test sometimes fails after going for 15 minutes with no output.

After this PR:

==COMMIT_MSG== The timelock migration ETE test more stably completes within 15 minutes. ==COMMIT_MSG==

Priority: P3

Concerns / possible downsides (what feedback would you like?):

Is documentation needed?: No

Compatibility

Only test changes

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?: Nothing in particular

What was existing testing like? What have you done to improve it?: They should hopefully flake less

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.: N/A

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?: N/A

Execution

Only test changes. But recall if it matters.

Scale

Only test changes

Development Process

Where should we start reviewing?: The one file

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:

Please tag any other people who should be aware of this PR: @jeremyk-91 @sverma30 @raiju

barisoyoruk commented 3 months ago

Is the lost coverage important? I'd argue no: the existence of the message already indicates that the node knows that it should not create the TM

Do you mean the following message?

  private static Callable<Boolean> logsContainTransactionManagerCreationFailure() {
      return () -> {
          String serverLogs = CLIENT_ORCHESTRATION_EXTENSION.getClientLogs();
          return serverLogs.contains("IllegalArgumentException trying to convert the stored value to a long.");
      };
  }
jeremyk-91 commented 3 months ago

yep! Technically the tests would allow an implementation that ignores that messages and say assumes it's 0 and just creates the TM or something like that, but I don't imagine we would end up writing that.