Optimize context caching in the TestContext framework

The Spring TestContext Framework (TCF) uses a pretty convenient and flexible approach to create and subsequently reuse the context by aggregate MergedContextConfiguration. However it has a drawback: in a large test suite there can eventually be created too many parallel active contexts, consuming a lot of resources like large thread pools or Testcontainers beans.

There are several good practices to reduce the number of independent configurations like introducing common test super classes and reducing usage of @MockBean annotations. Also we can reduce the overhead of each new context like statically defined/reusable Testcontainers containers.

Unfortunately, these approaches do not work very well for big distributed projects with many teams contributing independently. So eventually OOM and other problems arise.

As a mitigation there can be some urgent fixes like using @DirtiesContext or the spring.test.context.cache.maxSize=1 option as suggested by @snicoll (https://github.com/spring-projects/spring-boot/issues/15654). The suggested approach fixed the problem, but it has a disadvantage as well: the total test execution time increased, due to the larger number of context re-initializations.

I had the same problem while working with a https://miro.com monolith server application, and I've found two more approaches to reduce the number of active contexts.

Smart (Auto) DirtiesContext

For single-threaded test executions, we can know the sequence (list) of tests in the very beginning of the suite. It's easy to calculate the MergedContextConfiguration per each class - and now it's possible to define a custom SmartDirtiesTestExecutionListener with an afterTestClass implementation pretty similar to standard DirtiesContextTestExecutionListener, but there is binary logic: if the current test class is the last test class using a given context configuration, close the context by marking it as dirty.

This trivial approach significantly reduced the number of active contexts and decreased the time of test execution (as fewer resources like CPU were consumed).

The only problem was that on the level of the TCF it's not possible to access the suite, so I originally implemented a custom TestNG listener, and later the JUnit 5 implementation was added.

Test reordering

We can do even better if the test execution sequence is reordered - so we can group test classes that share the same context configuration sequentially, and the number of active spring contexts will never exceed one.

The following chart demonstrates the approach (same color = same MergedContextConfiguration):

It's not possible to reorder tests on the level of the TCF, but it's possible to do so on the level of:

JUnit Jupiter SmartDirtiesClassOrderer defined via junit-platform.properties
TestNG SmartDirtiesSuiteListener defined via META-INF/services ITestNGListener
JUnit 4 vintage-engine SmartDirtiesPostDiscoveryFilter defined via META-INF/services PostDiscoveryFilter (workaround approach)

Metrics: fewer contexts and faster

Here is a sample test suite:

On the horizontal axis there is a timeline and on the vertical axis the number of active spring contexts (calculated each 10 sec). As you can see, the Smart DirtiesContext + test reordering (yellow) is always better - it has fewer active contexts, and the total time of test execution is the smallest (because of less CPU consumption + minimal context re-initialization).

The following chart is about number of parallel active Testcontainers docker containers (represented as spring Beans) for another test suite and is even more representative (unfortunately I cannot compare with cache.maxSize=1 approach):

Prototype

I've made a library https://github.com/seregamorph/spring-test-smart-context that implements this approach for JUnit Jupiter, TestNG, and even JUnit 4 via the vintage-engine to demonstrate the approach.

@snicoll and @marcphilipp were so kind to give some initial feedback, and then Stéphane suggested to submit a ticket to continue discussion here.

I understand that the current implementation of the TCF conceptually does not allow this approach as it works on another level, but this can be a possible direction of library evolution (for both spring-framework and junit-platform). As this approach has significant advantages like flexibility and freedom for engineers - they do not need to care too much regarding the optimizations.

Spring team, curious about your opinion. cc @sbrannen

Hi @seregamorph,

Thanks for sharing your ideas as well as your prototype! 👍

Those are indeed very interesting approaches to the task at hand.

I especially like the idea of executing test classes that share the same context configuration sequentially.

Though, we might be able to achieve a similar effect by tracking all test classes that use the same context configuration and closing after the last one, regardless of the order in which they are executed (for example, by decrementing a counter and eagerly closing the context once we hit 0).

I do have some concerns, however, and I'll add a few of them here as "food for thought".

This would have to be an opt-in feature, since we could never apply these semantics by default. However, making it opt-in may be problematic since some of the techniques used in the prototype require that "services" be automatically registered based on what's in the classpath.
- So, this may potentially require an additional artifact (JAR) that users can optionally add to the classpath, but we've never done that before in the core Spring Framework.
I'm a bit hesitant to implement this feature set differently for JUnit Jupiter, JUnit 4, and TestNG.
- In light of that, we may choose to only support this with JUnit Jupiter.
Some parts of this won't work if...
- a different ClassOrderer is used
- test classes are executed in parallel
- either by JUnit Platform / JUnit Jupiter
- or by the IDE or build tool

In any case, I have assigned this to the 6.2.x milestone since I think it's worth investigating what's possible.

we might be able to achieve a similar effect by tracking all test classes that use the same context configuration and closing after the last one, regardless of the order in which they are executed

This has the downside of keeping the context (and all the resources attached to it) for a longer period of time. I think it's important to take into account what @seregamorph and team have been doing here. Speaking of which, I'd like to see how we can help them to submit a PR given they already have an implementation.

making it opt-in may be problematic since some of the techniques used in the prototype require that "services" be automatically registered based on what's in the classpath. So, this may potentially require an additional artifact (JAR) that users can optionally add to the classpath

Some parts of this won't work if... a different ClassOrderer is used

That's right! Maybe for this reason it makes sense to plan possible changes in junit-platform first? Such changes can unify the approach for all three test frameworks (JUnit 4 via vintage-engine, Jupiter, TestNG via testng-engine; maybe spock), as well as for all running environments (maven, gradle, IDEA).

One more possible approach that can allow to avoid introducing new jar artifacts in the group of spring-framework:

keep current behaviour as is, for the sake of back compatibility
add SmartDirtiesClassOrderer to spring-test TCF framework (not activated by default)
add SmartDirtiesContextTestExecutionListener to the list of default TestExecutionListeners and make it only working if SmartDirtiesClassOrderer was executed
document new behaviour for projects that have issues with too many active contexts (or would like to optimize it), which will require just to define resources/junit-platform.properties with content:
```
junit.jupiter.testclass.order.default = com.github.seregamorph.testsmartcontext.jupiter.SmartDirtiesClassOrderer
```

By the way, OOM is not an only reason why this new behaviour can make sense:

in the long suite of tests it can be confusing why some scheduled jobs are still working, the tasks that were submitted for contexts that are already not needed. It can complicate debugging test issues as the test log becomes more and more overloaded with confusing logs
it's better to have ordered test execution as it will produce (much more) reproducible results than randomly ordered (not specified order) by default

spring-projects / spring-framework