projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
939 stars 120 forks source link

Investigate resource leak in Server Admin Tool integration tests #8571

Open adutra opened 2 months ago

adutra commented 2 months ago

8548 introduces integration tests for various backends using MultiEnvTestEngine. Unfortunately there seems to be a resource leak, which is currently mitigated with forkEvery = 4 Gradle setting. It could be a consequence of a @QuarkusMainTest being executed by MultiEnvTestEngine. If so, a simple fix could be to migrate to QuarkusMainIntegrationTest – but that would require changes to the tests.

adutra commented 2 months ago

When forkEvery setting is removed, usually the integration tests end up in OOM:

[1419.245s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled.
[1419.245s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize=
CodeHeap 'non-profiled nmethods': size=119168Kb used=119167Kb max_used=119167Kb free=0Kb
 bounds [0x0000000116934000, 0x000000011dd94000, 0x000000011dd94000]
CodeHeap 'profiled nmethods': size=119152Kb used=116511Kb max_used=119132Kb free=2640Kb
 bounds [0x000000010ed94000, 0x00000001161f0000, 0x00000001161f0000]
CodeHeap 'non-nmethods': size=7440Kb used=1637Kb max_used=2337Kb free=5802Kb
 bounds [0x00000001161f0000, 0x0000000116460000, 0x0000000116934000]
 total_blobs=72306 nmethods=71346 adapters=871
 compilation: disabled (not enough contiguous free space left)
              stopped_count=1, restarted_count=0
OpenJDK 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize=
 full_count=1
java.lang.OutOfMemoryError: Java heap space
dimas-b commented 2 months ago

I vaguely recall Quarkus class loaders holding a lot of stuff in memory across tests (specifically Quarkus CLI tests)... but I could not find anything quickly.

adutra commented 2 months ago

I got a heap dump yesterday. The biggest objects are of these 2 types:

io.quarkus.deployment.dev.RuntimeUpdatesProcessor
io.quarkus.bootstrap.classloading.QuarkusClassLoader

There were 166 instances of QuarkusClassLoader.

dimas-b commented 2 months ago

This looks in line with my previous experience. IIRC, Quarkus does not release class loaders when the CLI application is re-launched by tests.

snazy commented 2 months ago

It's "usually" some MBean that's still registered - or some thread lingering around. Both hold references to the class loader, preventing the class loader being GC'd. Not directly a Quarkus issue.

dimas-b commented 2 months ago

Would it help if we switched to @QuarkusMainIntegrationTest? I suppose it'd use a separate JVM for the tool runtime.

snazy commented 2 months ago

It would slow tests down - but doesn't necessarily eliminate resource issues.