quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.74k stars 2.67k forks source link

OOM in tests (Part 3) #42471

Closed mschorsch closed 1 month ago

mschorsch commented 2 months ago

Describe the bug

As stated in https://github.com/quarkusio/quarkus/issues/42355#issuecomment-2275085456, we continue to receive an OOM in our tests. I was able to create a reproducer for this.

Seems there is another memory/classloader leak.

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

oom-reproducer3.zip

./gradlew test --console=plain

Output of uname -a or ver

Linux

Output of java -version

Java 21

Quarkus version or git rev

Quarkus 3.13.2

Build tool (ie. output of mvnw --version or gradlew --version)

Gradle 8.9

Additional information

No response

gsmet commented 2 months ago

That's awesome, thanks. I will have a closer look soon.

gsmet commented 2 months ago

I have been chasing them for a while but it all depends on the extensions being around. I hope we'll be able to get to a more sustainable situation thanks to you.

gsmet commented 2 months ago

OK so I can reproduce it and this is a bit different. The issue is mostly related to Testcontainers threads and things are not leaking like crazy as they use to do (at least in my experiments).

Now, our CL objects are very large and they also keep open a gazillion of zip file systems, which explains the problem (I think).

mschorsch commented 2 months ago

Unfortunately, it is not easy to create a reproducer. In our real project I can easily produce an OOM (without testcontainers) unfortunately I could not extract a simple reproducer so far. I was hoping that this reproducer would at least partially solve the problem.

Overall, one would expect that if each test class is executed individually and sequentially, the memory requirements would remain the same (at least after a full GC). This is not the case here.

gsmet commented 2 months ago

I pushed a first PR here ^.

It improves things a bit but doesn’t really fix the reproducer.

gsmet commented 2 months ago

Do you think we could maybe organize a Google Meet call together?

If you can get a heap dump and install Eclipse MAT, I could show you how to get some info and we could make progress without you sending me the heap dump?

My guess is that it’s all going to be about class loaders so I won’t see any privileged info on your screen.

Let me know if that could work for you. My email is open at gsmet at redhat dot com.

mschorsch commented 2 months ago

Hi @gsmet , sorry for the delay. Unfortunately this is currently not possible as I am on vacation. I will be back in September.

Nevertheless, thank you for your work and the first improvements in https://github.com/quarkusio/quarkus/pull/42492 👍

mschorsch commented 2 months ago

@gsmet A positive message, I have checked your changes from https://github.com/quarkusio/quarkus/pull/42492 again against our real application and I can confirm that the changes prevent an OOM from occurring 👍 .

gsmet commented 2 months ago

Yeah, it still fails with your reproducer 3 though. There are several other things to fix.

https://github.com/quarkusio/quarkus/pull/42525 should help more and then there's the issue of Testcontainers leaking things like crazy that I'm trying to solve.

gsmet commented 2 months ago

@mschorsch FWIW, I was able to run your reproducer to completion with all these three applied:

Now, just be aware that they will probably require some discussions and I'm not convinced it will be a good move to push them to 3.14. Time will tell.

mschorsch commented 2 months ago

@gsmet Thanks. Looks promising

gsmet commented 2 months ago

Unfortunately the Testcontainers one has little chance to be merged as it comes with some massive problem.

We will have some discussions when @geoand is back from PTO.

afalhambra-hivemq commented 2 months ago

Hey guys, we've been having the same problem in our tests, since the @QuarkusTestResource was deprecated, tons of tests ran into an OOM issue. And as per this comment, setting the restrictToAnnotatedClass = false in the @WithTestResource did work for us as a workaround though.

gsmet commented 2 months ago

Yes I'm not surprised.

gsmet commented 2 months ago

I tried to make the migration guide a lot more explicit: https://github.com/quarkusio/quarkus/wiki/Migration-Guide-3.13#quarkustestresource-replaced-by-withtestresource-gear-white_check_mark .

@afalhambra-hivemq would the new wording have helped you?

gsmet commented 2 months ago

I also created https://github.com/quarkusio/quarkus/pull/42700 to add more info to the Javadoc.

afalhambra-hivemq commented 2 months ago

I tried to make the migration guide a lot more explicit: https://github.com/quarkusio/quarkus/wiki/Migration-Guide-3.13#quarkustestresource-replaced-by-withtestresource-gear-white_check_mark .

@afalhambra-hivemq would the new wording have helped you?

Yes, definitely. Thanks a bunch. Looking forward to the fix.

geoand commented 2 months ago

Once https://github.com/quarkusio/quarkus/pull/42525 is in, we need to recap where we are, ideally with some real world examples

geoand commented 1 month ago

@mschorsch we have changed the behavior of @WithTestResource in main, see https://github.com/quarkusio/quarkus/pull/42852 for more details.

Any chance you can give it a spin?

mschorsch commented 1 month ago

@geoand I tested our application against https://github.com/quarkusio/quarkus/pull/42852 with every option from TestResourceScope and it worked without a problem 🚀 .

In addition, I was even able to reduce the memory requirements for the tests, I assume due to https://github.com/quarkusio/quarkus/pull/42525.

geoand commented 1 month ago

Awesome, thanks!

geoand commented 1 month ago

Let's close this for now and if similar problems occur in the future, we can open a new issue