quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.55k stars 2.62k forks source link

Regression testing & defense against classloader leaks #42510

Open Sanne opened 4 weeks ago

Sanne commented 4 weeks ago

Description

Goal: reduce future regressions in the area of memory leaks, with a special focus on classloader leaks

In its very nature of reloading the whole app, including many 3rd party dependencies of which we might not know the design in great detail, we've found that maintaining extensions, especially for dev-mode, to not leak their classloaders is challenging. Many leaks have been resolved recently, but we need a way to ensure we keep improving over potentially regressing again when people lower their guard on this topic.

How

Memory leaks are actually testable; such a test is not even tricky but it requires a commitment from our side: I'm wondering if we can proof the approach on a tactical selection of extensions - immediately getting some benefits already - but eventually to make it zero-impact for any extension maintainers; this is described in more detail below.

Originally I prototyped the general idea of regression tests against classloader leaks in the Hibernate ORM project, about a year ago; I've left it there to mature for some time as I wasn't sure the timeout-approach wouldn't be potentially too fragile, and flaky CI is unacceptable to me; but it seems actually very robust and reliable: we've had no problems with it and it's been a long time now.

My Hibernate ORM utilities can be found here, and needless to say I'm granting permission to copy them into Quarkus and adapt as necessary, including adapt the license to ASL2.

Example usage:

Testing the test utility to make sure it actually works, but also to see how it's supposed to be used:

Core logic to test for leaks of just any type (can spot a leak of any object, not just classloaders):

Making it nicer to use specifically for ClassLoader leaks, as it's a bit of a tricky special case:

I'm assuming it needs adapting to be integrated within Quarkus; wondering if there's a smart way to integrate it in such a way that all extensions are inherently tested this way? Perhaps it could be integrated in the testing framework, and call into these helpers to ensure that after any testsuite is run, a classloader leak would be spotted?

Perhaps I'm dreaming, as I'm not too familiar with our testing frameworks - but I think it would be really great if somebody could think of such an integration.

Rollout Suggestions

I'm confident that many extensions still have classloader leaks we're not aware of.

Enforcing such a global check on all extensions would make it much harder to introduce such a check, so I'd suggest initially needing an opt-in, so that various extension owners can be encouraged to fix whatever they need fixing at their own pace, and eventually switch to opt-out. This would allow us to benefit from the regression tests in short time, and roll it out gradually.

quarkus-bot[bot] commented 4 weeks ago

/cc @mjurc (qe), @rsvoboda (qe)