Can't always replay individual test cases in JUnit when some of them are ignored.

sageserpent-open commented 11 months ago

See example test here.

The test uses the JUnit integration, using the @TestFactory approach.
The test has a limit of 200 test cases configured via Trials.withLimit.
It also generates a lot of ignored test cases, using the post-hoc rejection approach via Trials.reject.

It is possible to replay any of the test cases that actually run from the first 200 test cases (including the ignored ones) ; this results in the same outcome, so if the test case was ignored, it is ignored on replay, and if it proceeded, then it will proceed on replay. In both situations a test case is submitted to the test body lambda.

In contrast, attempting to replay a test case whose number is beyond 200 doesn't actually run anything, even if the test case being replayed wasn't rejected to start with.

This doesn't happen if all of the test cases proceed through in the first place - see the prior test in that file for confirmation.

sageserpent-open commented 9 months ago

This is caused by a clash between JUnit5's approach to selecting a test instance to replay, and Americium allowing the outcome of individual trials to influence the pipeline that generates test cases.

So in JUnit5's TestFactoryTestDescriptor we have a loop that ploughs through the test cases (wrapped up in DynamicNode instances), creating a JupiterTestDescriptor for each DynamicNode. Now that in turn checks the UniqueId associated with the DynamicNode against the one supplied by whatever is replaying the test, be it IntelliJ or whatever, this happens in the filter invocation here.

Now this works fine for trial invocations that are produced deterministically, because the loop will recapitulate the same sequence of unique ids leading up to the one we want to replay and eventually yield a non-empty test descriptor. Note that when this happens, none of the trials corresponding to the unique ids are actually run, apart from the one that matches the unique id we want to replay.

It is known and documented that this doesn't work for shrinkage, because the pattern of failures seen during shrinkage influences how the test cases are generated - so when a given trial is replayed, it won't have the same context of failing trials already invoked that lead up to it, so the wrong test case will be generated.

This is a similar phenomenom: use of Trials.reject (or Trials.whenever) within a test function influences subsequent trials, because it interacts with the whatever strategy is controlling test case supply. This is by design, we want post-hoc rejection to emulate filtering, and we expect filtering to play well with any limits imposed by the strategy; so in the above example we expect 200 filtered test cases to be supplied, regardless of whether by upfront filtation or post-hoc rejection.

So again, we're seeing non-determinism in how the test cases are produced - the trial invocations have an effect that is missing when we do a replay.

It might be possible to hack TrialsTestExtension to note the presence of a unique id in the ExtensionContext passed to it, so it could be feasible to use a test case recipe to create a unique id, then directly reproduce the test case from that id. That would be nice, in that it would avoid having to plough through all of the irrrelevant unique ids that led up to the one we want to replay.

Even if a recipe can't be encoded as a UniqueId, we could use the recipe hash database to associate a recipe with the unique id - perhaps by encoding the recipe hash as a unique id. There might have to be some fancy footwork to keep a meaningful test description, need to check on this.

The problem is that TrialsTestExtension is completely out of the picture for a test that uses @TestFactory.

sageserpent-open commented 9 months ago

As it happens, there is something in the JUnit5 extension API to hook into a @TestFactory - we have InvocationInterceptor.interceptTestFactoryMethod, and that takes an ExtensionContext, which in turn allows access to the unique id. So with a bit of jiggery-pokery (aka a thread context), that would allow junit5.dynamicTests to do something with the unique id ... or worse still, this could be freighted right down into the supply implementation, doing something similar to this.

That would also permit TrialsTestExtension to be smarter about replaying trials - it might even be able to replay trials invoked as part of shrinkage. Hmm.

Need to think about this...

sageserpent-open commented 9 months ago

It might be simpler to set the recipe hash system property as a temporary resource rather than hack in thread local state - we already have support for this to reproduce tests via system property trials.recipeHash. So as long as we build a mapping between UniqueId and the recipe hash in the JUnit5 extension callback for a test (trial) invocation, we should be able to retrieve the recipe hash later on when calling InvocationInterceptor.interceptTestFactoryMethod. The question is, does the test invocation callback have the ability to find the relevant recipe hash?

sageserpent-open commented 9 months ago

Parking this for a while, marking as won't fix.

A clumsy workaround is to use a timed strategy instead of a fixed limit - so presumably supply will keep ploughing away until it hits the required unique id. Experimenting shows this to be a workable solution for handling rejected trials, although it won't allow shrinkage test cases to be reproduced.

sageserpent-open commented 6 months ago

Revisiting, as this also relates to #69 - in both cases the desired result would be to just run a single trial with a specific test case and finish.

From above we have InvocationInterceptor.interceptTestFactoryMethod - that will tell us the UniqueId (the JUnit 5 one, not the one defined by Americium that was introduced after this ticket) via the ExecutionContext.

So each trial could note is unique id and the associated reproduction recipe in the RocksDB database.

There is a TestExecutionListener that can be injected into JUnit5 via Java's ServiceLoader mechanism. This can discover the replayed test's TestIdentifier, which in turn yields the exact UniqueId for the replayed test case.

So as long as the RocksDB database still has the associated recipe, we can jemmy the use of that recipe, just as is done in that link above where a JVM property is used to the default override test case generation.

This means a nasty global coupling of the TestExecutionListener to the core implementation but there is already the aformentioned jemmy, so let's go with it as a spike...

sageserpent-open commented 6 months ago

That approach isn't going to work - the call to the TestExecutionListener (which cascades off a primary abstraction, an EngineExecutionListener managed by JUnit 5) simply responds to the execution of TestTemplateInvocationTestDescriptor instances that are mapped from the TestTemplatInvocationContext instances that were themselves mapped from (and refer back to) the test cases being generated as part of the shrinkage sequence.

So by the time we know the UniqueId, the test cases have already begun to be generated - we can use this mechanism to discover the unique ids and associate then with a recipe, but not to replay specific test cases.

So how does the unique id for a replayed test case make its way down into JUnit 5 from IntelliJ?

sageserpent-open commented 6 months ago

The point where IntelliJ invokes JUnit5 is SessionPerRequestLauncher.execute - this is passed a LauncherDiscoveryRequest as well as a TestExecutionListener implementation from IntelliJ. The request contains a UniqueIdSelector that picks out the specific test case, actually via an encaplsulated UniqueId.

Can we spy on this with some callback?

sageserpent-open commented 6 months ago

The short answer to the above is 'no'. It seems pretty clear that JUnit5 does not want to expose the precise UniqueId to any third-party extension code before it has decided it is good and ready to do so. The closest I got was to observe that the extension context passed to TrialsTestExtension.provideTestTemplateInvocationContexts is actually a TestTemplateExtensionContext, and this squirrels away the full UniqueId for the test case within a DynamicDescendantFilter hiding inside a TestTemplateTestDescriptor, which itself is not accessible.

At this point I'm taking the hint.

sageserpent-open commented 6 months ago

Asked a question about this on Stack Overflow just in case...

sageserpent-open commented 6 months ago

Found a way to get the UniqueId for the replayed test case based on the StackOverflow chat and some further exploration by yours truly, so back on the road again...

sageserpent-open commented 6 months ago

As of Git commit SHA: 583ac36 there is a spike for this that unfortunately doesn’t work. It has become clear that while it is feasible to capture the unique ids for replayed test cases and to generate only those test cases, this still doesn’t play well with JUnit5.

JUnit5 expects to iterate through test cases until it generates a unique id that matches one of the ones slated for replay; this won’t happen if only those are generated. We have to recapitulate the original sequence of unique ids.

Now it may be possible to do this by generating stub test cases until the unique id matches a replay unique id, carrying on until all requested unique ids have been replayed, but this is very brittle. We need to store an association between unique ids and recipes, and if this association is lost, then we just have to fall back on the existing mechanism and hope for the best.

This could be done by using the existing mechanism, but breaking out to using a recipe whenever the current unique is one of the ones requested for replay and has a recipe.

An ungainly hack, but feasible…

sageserpent-open commented 6 months ago

... barely feasible, as it turns out. JUnit5 waits until the last possible moment before reluctantly yielding the full unique id associated with a trial (as opposed to the test method executing the trials). Plugging into a TestExecutionListener.dynamicTestRegistered is too late, this happens after the call to TrialsTestExtension.provideTestTemplateInvocationContexts.

However it is possible to pick up contexts later on that do have the full unique id, but these come after the callback that is used to produce the label for the trial. Sacrificing the label leads to the following mess in Git commit SHA: 044a1bd1f9f0245e70f4182ee389c20c9fba55a4.

Manual testing shows that this does allow direct reproduction of trials from a shrinkage sequence as well as 'ordinary' ones, and will fall back to normal behaviour (rebuilding an unrelated 'ordinary' test case) when the RocksDB database is deleted.

The code is a mess, though, and this technique only works for tests annotated with @ConfiguredTrialsTest.

sageserpent-open commented 6 months ago

As of Git commit SHA: e4d3180a9482b728e375249534b5d7e7ff9ee3ae the code is slightly less of a disaster in TrialsTestExtension, and I suspect that this ticket's required functionality has been delivered, as well as that of #69 and also the ability to replay test cases that turned up in a shrinkage cycle that was ditched in #39.

However, when a directly replayed trial throws an exception, this causes warnings from JUnit5 - need to finesse this...

sageserpent-open commented 6 months ago

As of Git commit SHA: 610de2e43be10ae24773c0d7cb8c583043751a0d, there is fairly convincing support for tests using either the @TrialsTest or the @ConfiguredTrialsTest annotations.

Could this approach work for tests annotated with @TestFactory and using <supply syntax>.dynamicTests?

sageserpent-open commented 6 months ago

The answer is a tentative 'yes' - again, the label for a trial of a directly replayed test case is problematic, as is JUnit5's clasping of the full unique id for a trial until the very last moment. However, it looks feasible...

sageserpent-open commented 6 months ago

As of Git commit SHA: 1fa8f802807da2c19a2c36fa35db7d8d7a539d83, junit5.dynamicTests now allows direct replay of test cases.

sageserpent-open commented 6 months ago

Evidence, part 1:

Executing a JUnit5 test with dynamic tests (DemonstrateJUnit5Integration.dynamicTestsExampleWithIgnoredTestCases):

Screenshot 2024-03-07 at 12 37 42

Observe the test case and outcome for trial 22.

sageserpent-open commented 6 months ago

Evidence, part 2:

After performing rm -rf ${TMPDIR}/trialsRunDatabase in a shell, attempting to replay trial 22 does not yield a test case at all:

Screenshot 2024-03-07 at 12 38 00

sageserpent-open commented 6 months ago

Evidence, part 3:

Re-running the full set of trials as per the first part of the evidence regenerates the database, and allows direct replay of trial 22:

Screenshot 2024-03-07 at 12 38 11

Observe the same test case and outcome.

sageserpent-open commented 6 months ago

Evidence, part 4:

Highlighting trial 21 that was rejected in the full set of trials:

Screenshot 2024-03-07 at 12 38 30

Observe the test case and outcome.

sageserpent-open commented 6 months ago

Evidence, part 5:

Direct replay of trial 21 is possible after the database is regenerated:

Screenshot 2024-03-07 at 12 38 41

Observe the test case and outcome.

sageserpent-open commented 6 months ago

This went out in release 1.19.0, Git commit SHA: df58230223ee3cea658b9f29aa3257da9eabf47f.

sageserpent-open / americium

Can't always replay individual test cases in JUnit when some of them are ignored. #66