quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.44k stars 2.58k forks source link

Memory usage increased between Quarkus 3.6 and 3.7 #38814

Open holly-cummins opened 5 months ago

holly-cummins commented 5 months ago

Describe the bug

I have an extension test in the quarkus-pact project that started OOMing in the ecosystem CI recently. I've worked around the issue by increasing the memory available to the test, but we may want to investigate why the memory requirements have increased.

I can reproduce just by starting the Quarkus application with constrained memory (it fails to start). The application has two pact extensions in it, so it does more classloading than a simpler application. With Quarkus 3.6, the app starts, and with 3.7, it fails to start. I can force a failure on Quarkus 3.6 by dropping the available memory down to 110mb (from 128mb), so even with Quarkus 3.6, the memory usage was fairly high - 3.7 just happens to be the release that tipped it over the edge into failure.

The ecosystem CI caught the change: https://github.com/quarkiverse/quarkiverse/issues/94#issuecomment-1837901693. This is the CI run where the problem first appeared, which might help us identify a specific change: https://github.com/quarkiverse/quarkus-pact/actions/runs/7082294368

My initial guess is that this is related to classloading, but I don't have firm evidence. It could also be test execution. If I give the applications more memory, so that it doesn't OOM, and then trigger a dump once startup is finished, the 3.6 and 3.7 dumps both use about 81mb of memory, and look similar.

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

I've attached a reproducer app. Unzip it, and then run

mvn quarkus:dev -Djvm.args="-Xmx128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=pact-dump.hprof"

For more control, another way to reproduce is

  1. Clone https://github.com/quarkiverse/quarkus-pact
  2. Build with mvn install. It should pass.
  3. Edit https://github.com/quarkiverse/quarkus-pact/blob/e00179e4f5e0ab5c3b434393edd4e80ab3551766/cross-extension-integration-tests/src/test/java/io/quarkiverse/pact/it/DevModeContractTestIT.java#L76 and change 140m to 128m, the default. (I had to override the default memory allocation from the parent test class to get the test to pass.)
  4. cd cross-extension-integration-tests && mvn clean install should now show the failure
  5. Once everything is built, you can also happy-everyone-all-together-processed.zip run the test project standalone.
cd quarkus-pact/cross-extension-integration-tests/target/test-classes/projects/happy-everyone-all-together-processed
mvn quarkus:dev -Djvm.args="-Xmx110m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=pact-dump.hprof"

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

gsmet commented 5 months ago

If you can publish a 3.6 heap dump and a 3.7 one, it probably would help to all work on the same baseline.

holly-cummins commented 5 months ago

If you can publish a 3.6 heap dump and a 3.7 one, it probably would help to all work on the same baseline.

I've got the dumps, and will attach them, but I don't think I have the 'right' dump. I had two options for collection the dumps:

Dumps are here: https://drive.google.com/file/d/1gaGHo-qHu2GImdskJMiF-dNCocabGAfG/view?usp=sharing I've done 3.6 and 3.7, for manually triggered post-startup dumps in an unconstrained heap, and dumps triggered on OOM in a constrained heap. The constrained heap dumps are bigger.

holly-cummins commented 5 months ago

Some of what I'm seeing in the dumps also looks similar to what Guillaume investigated in https://github.com/quarkusio/quarkus/issues/35280, so I think this issue may be in a similar domain to that one.

holly-cummins commented 1 month ago

See also https://github.com/quarkusio/quarkus/issues/41156. Memory requirements went up again in Quarkus 3.12.

gsmet commented 3 weeks ago

I had a look at the issue and here are my findings for main.

The first thing is that the OOM is triggered by the build process, it has nothing to do with CL leaks.

Second thing is that there are two main culprits:

Screenshot from 2024-07-10 10-32-55

I wonder if we could improve that. I don't fully understand why we would need to keep a list of the full managed dependencies tree once the artifact has been resolved - but I suppose there are very few projects with that many managed dependencies. All of this is in the maven-resolver so I think we would need to adjust things upstream.

We would probably need to discuss this with @aloubyansky .

That's for the first part of the work. Now I will have a look at a heap dump from an old version to see what could be the actual regression.

gsmet commented 3 weeks ago

So from one I can see, it's just Quarkus getting generally fatter:

The main contributors to the main BOM (I'm assuming most of the managed dependencies are coming from there - I limited myself to the ones with 10+ entries):

    677         <groupId>io.quarkus</groupId>
    245         <groupId>io.opentelemetry.javaagent.instrumentation</groupId>
    182         <groupId>io.vertx</groupId>
    131         <groupId>io.fabric8</groupId>
    103         <groupId>io.smallrye.reactive</groupId>
     73         <groupId>io.opentelemetry.instrumentation</groupId>
     56         <groupId>org.testcontainers</groupId>
     56         <groupId>io.netty</groupId>
     43         <groupId>io.smallrye</groupId>
     42         <groupId>io.grpc</groupId>
     35         <groupId>org.wildfly.security</groupId>
     35         <groupId>io.opentelemetry</groupId>
     33         <groupId>org.jboss.resteasy</groupId>
     30         <groupId>org.mvnpm</groupId>
     30         <groupId>io.micrometer</groupId>
     23         <groupId>org.jetbrains.kotlin</groupId>
     19         <groupId>com.fasterxml.jackson.module</groupId>
     18         <groupId>org.jetbrains.kotlinx</groupId>
     18         <groupId>io.smallrye.stork</groupId>
     16         <groupId>org.infinispan</groupId>
     16         <groupId>org.hibernate.search</groupId>
     16         <groupId>com.fasterxml.jackson.datatype</groupId>
     13         <groupId>io.smallrye.common</groupId>
     12         <groupId>org.junit.platform</groupId>
     12         <groupId>org.apache.maven</groupId>
     12         <groupId>io.dekorate</groupId>
     11         <groupId>com.google.protobuf</groupId>
     10         <groupId>io.quarkus.resteasy.reactive</groupId>
     10         <groupId>io.opentelemetry.javaagent</groupId>
     10         <groupId>com.google.http-client</groupId>
     10         <groupId>com.fasterxml.jackson.dataformat</groupId>

For 3.6.1, that is the other version I tested:

    568         <groupId>io.quarkus</groupId>
    240         <groupId>io.opentelemetry.javaagent.instrumentation</groupId>
    182         <groupId>io.vertx</groupId>
    122         <groupId>io.fabric8</groupId>
     95         <groupId>io.smallrye.reactive</groupId>
     72         <groupId>io.opentelemetry.instrumentation</groupId>
     54         <groupId>io.netty</groupId>
     47         <groupId>org.testcontainers</groupId>
     41         <groupId>io.smallrye</groupId>
     40         <groupId>io.grpc</groupId>
     36         <groupId>io.opentelemetry</groupId>
     35         <groupId>org.wildfly.security</groupId>
     33         <groupId>org.jboss.resteasy</groupId>
     30         <groupId>io.micrometer</groupId>
     23         <groupId>org.jetbrains.kotlin</groupId>
     18         <groupId>org.jetbrains.kotlinx</groupId>
     17         <groupId>io.smallrye.stork</groupId>
     17         <groupId>com.fasterxml.jackson.module</groupId>
     16         <groupId>org.infinispan</groupId>
     16         <groupId>com.fasterxml.jackson.datatype</groupId>
     12         <groupId>org.junit.platform</groupId>
     12         <groupId>org.apache.maven</groupId>
     12         <groupId>io.smallrye.common</groupId>
     11         <groupId>io.dekorate</groupId>
     11         <groupId>com.google.protobuf</groupId>
     10         <groupId>io.quarkus.resteasy.reactive</groupId>
     10         <groupId>io.opentelemetry.javaagent</groupId>
     10         <groupId>com.google.http-client</groupId>
     10         <groupId>com.fasterxml.jackson.dataformat</groupId>

From a quick analysis:

I think one thing we are learning with this is that keeping relocations around for ages has a cost so we need to drop the relocations at some point.

maxandersen commented 3 weeks ago

I think one thing we are learning with this is that keeping relocations around for ages has a cost so we need to drop the relocations at some point.

how big impact are they ? I would imagine fixing/improving how we deal with the full resolved set (which we shouldn't have to keep around forever) then relocations should be quite minimal/not a problem?

gsmet commented 3 weeks ago

fixing/improving how we deal with the full resolved set (which we shouldn't have to keep around forever)

I don't think we keep the Aether model in memory forever but we need it at some point and we are using quite a huge amount of memory for it (at least compared to our footprint - it's 1/3 of the memory footprint when we build/start dev mode).

Now, I will discuss with Alexey if things can be improved in this area but I wouldn't be surprised if we were pushing things to the limits with our huge BOM and that it wasn't really designed/optimized for such a use case. Maybe we could find ways to improve the Maven resolver, we will see. I wouldn't hold my breath though.

My wild guess is that it's going to get even worse for a large project with a lot of Quarkus extensions but that's just a theory, from the heap dumps I checked.

Now I'm not advocating for us to drop relocations at a fast pace. I'm just saying that we didn't really think they have a cost and they have.

So I would rather have the following policy:

For instance, we could remove the Quarkus REST/Quarkus Messaging relocations introduced 4 months ago after the 3.15 release.

maxandersen commented 3 weeks ago

I'm definitely curious on how/why we would have to retain all around forever - wasn't designed/intended normal build should need to retain everything. lets see what @aloubyansky sees.

gsmet commented 3 weeks ago

Again, we don't keep all around forever.

But each GoodDescriptor that is about an artifacts we build in the Quarkus tree (at least, that seemed like the common denominator for the huge ones) contains the whole set of managed dependencies for our BOM - and it's in memory for the time the Aether model is kept in memory.

maxandersen commented 3 weeks ago

Yes I get that - but even that shouldn't be necessary.

holly-cummins commented 3 weeks ago

We should probably distinguish between a "high tide" memory usage and a "steady state" memory usage. The symptom this work item was raised for was that our high tide memory usage was going up. Unloading things like the Aether memory model would/do help with steady state memory usage, but not with the high tide usage.

Big high tide memory usage could prevent people being able to build their Quarkus apps in constrained environments, but steady state memory usage is probably the metric that gets more attention/slides/blogs, etc.

dmlloyd commented 3 weeks ago

Not sure if this is relevant here, but concurrency (particularly concurrent startup) can affect the "high tide" in ways that can be hard to predict.

gsmet commented 3 weeks ago

but concurrency (particularly concurrent startup) can affect the "high tide" in ways that can be hard to predict

From what I have seen in the heap dumps, the problem we have here has little to do with concurrency.

gsmet commented 3 weeks ago

So what makes the issue so problematic for us people with a huge dependency management is that each descriptor will have its own copy of the whole dependency management tree.

See:

https://github.com/apache/maven/blob/f2a0865c7a66d44e13c88c9ef9146cfdf7024ac0/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java#L65-L70

And the convert() method:

https://github.com/apache/maven/blob/f2a0865c7a66d44e13c88c9ef9146cfdf7024ac0/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java#L94-L128

Given our whole dependency management tree is 800 KB, it grows very fast.

gsmet commented 3 weeks ago

@cstamas I was wondering if you had an opinion on ^

This other comment is also useful for context: https://github.com/quarkusio/quarkus/issues/38814#issuecomment-2219911522 (the second point about the managed dependencies).

cstamas commented 3 weeks ago

Does this mean that repository system session is kept alive, even after it is not needed? As data pool is stored within session...

gsmet commented 3 weeks ago

So our problem here is about what @holly-cummins called "high tide" memory usage, meaning that during our build we actually need quite a lot of memory. We don't have that problem at runtime, we don't keep the session alive. We are trying to reduce the memory footprint of the Quarkus build itself.

In the case of Quarkus, what I'm mentioning above takes 34 MB of memory - which is not ideal, especially since AFAICS most of it is due to the management dependencies being duplicated for each descriptor.

cstamas commented 3 weeks ago

Got it. There was a change in resolver in this respect (more speed at cost of memory) see https://issues.apache.org/jira/browse/MRESOLVER-320

In short: on this page: https://maven.apache.org/resolver-archives/resolver-1.9.21/configuration.html

Try out these switches:

-Daether.dependencyCollector.pool.descriptor=weak

also, try out BF collector, it also reduces memory usage, as it works less for same result:

-Daether.dependencyCollector.impl=bf
gsmet commented 3 weeks ago

Basically, I had two ideas (that might both be really dumb as I'm not familiar with this area at all):

cstamas commented 3 weeks ago

Just to be clear, these should be somehow passed to (and put into config properties of Resolver) around here, I guess: https://github.com/quarkusio/quarkus/blob/main/independent-projects/bootstrap/maven-resolver/src/main/java/io/quarkus/bootstrap/resolver/maven/BootstrapMavenContext.java#L452

gsmet commented 3 weeks ago

Yeah, we might have to give it a try, I'll discuss with @aloubyansky when he's back from PTO, but as you probably suspect... we also want speed, especially in dev mode :).

Interested in your feedback on my comment above ^^.

cstamas commented 3 weeks ago

Re feedback on your comment above, I see a specific "niche" here (am guessing from here, correct me if am wrong):

So what happens here is that assuming QRoot (shared parent POM) defines, let's say, 100 depMgt entries, and then quarkus modules Q1, Q2, etc all inherit QRoot parent, so Q1, Q2, Q3 will all have same 100 depMgt entries copied over and over again?

This may tackle maven model builder, as Resolver itself is "Maven-agnostic". Basically Q1, Q2... ArtifactDescriptorResult will cary same copies of depMgt as they all originate from QRoot POM.

Problem here is that Resolver deals with "built models", and as we know, Maven Model once built (by model builder) is "flat" (while Model Builder also caches quite extensively, but the notion of "parent POM" and "import POM" is gone once Model leaves Model builder).

This is an interesting problem, but TBH we never tinkered about it... will ask around other folks for opinions...

gsmet commented 3 weeks ago

@cstamas yes, exactly. Except we have 2300 dependencies in the Quarkus BOM so you can imagine that it grows fast.

What puzzles me a bit is why you would need to copy the whole dependency management tree in each descriptor. I don't see the value of it (but again, I'm really not familiar with this part, it's Alexey who has all the knowledge about this). I could see how having the elements that apply to the dependencies of this particular artifact makes sense - especially since you were saying the descriptors are flat - but the whole thing?

I tried to play with this idea very naively - this approach is obviously extremely slow when you have 2k entries (it's actually quite noticeable...) - but I wanted to play with it to see if it would improve our situation. It does.

Now I'm not sure it's correct and things would need to be tweaked for it to not make things unacceptably slow.

diff --git a/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java b/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java
index edb417858..5b91f09b2 100644
--- a/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java
+++ b/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java
@@ -56,14 +56,17 @@ public void populateResult(RepositorySystemSession session, ArtifactDescriptorRe
             result.addRepository(ArtifactDescriptorUtils.toRemoteRepository(r));
         }

+        DependencyManagement mgmt = model.getDependencyManagement();
+
         for (org.apache.maven.model.Dependency dependency : model.getDependencies()) {
             result.addDependency(convert(dependency, stereotypes));
-        }

-        DependencyManagement mgmt = model.getDependencyManagement();
-        if (mgmt != null) {
-            for (org.apache.maven.model.Dependency dependency : mgmt.getDependencies()) {
-                result.addManagedDependency(convert(dependency, stereotypes));
+            if (mgmt != null) {
+                mgmt.getDependencies().stream()
+                        .filter(d -> d.getGroupId().equals(dependency.getGroupId())
+                                && d.getArtifactId().equals(dependency.getArtifactId()))
+                        .findAny()
+                        .ifPresent(d -> result.addManagedDependency(convert(d, stereotypes)));
             }
         }
cstamas commented 3 weeks ago

Delegate is by the way, "pluggable": https://github.com/apache/maven/blob/e7e0cbb4053de3d9e83c82048875969d63918d68/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/DefaultArtifactDescriptorReader.java#L185

Just stick one instance into session.configProperties and resolver will use that.


So, the logic is: if on Node N1, if D1 was added as dependency, remove D1 from depMgt? This sounds interesting, let me push this change thru Maven ITs. At first this looks promising as:

But am afraid this is not so simple, let see IT results....

cstamas commented 3 weeks ago

Wait, did I misread the diff...?

cstamas commented 3 weeks ago

So, the "do not add D1 to depMgt if already added as dep" is here, let's see ITs: Maven4: https://github.com/apache/maven/pull/1613 Maven3: https://github.com/apache/maven/pull/1614

cstamas commented 3 weeks ago

Removing "unused" depMgt on POM level is not possible, they are like "future pledge", and you don't know will and when will they be used (or may be unused as well). Same POM (built model) may be used in different places in graph, and at one place the D1 depMgt may affect something downstream of it, and at other place may not, as for example D1 parent (or parent parent, etc) may redefine depMgt ("and closer to root wins"). Hence, to me this seems not doable on POM level (ArtifactDescriptor level, as again, same descriptors are reused in different nodes in graph).

My PR is probably doing "way less" (removing way less) than your, but again, I find your diff wrong: you would remove depMgt entry that is not added as dep, if am reading your diff correctly.

cstamas commented 3 weeks ago

I am not fully synced how resolver is used in Quarkus, but from reading some stuff, few ideas:

But each GoodDescriptor that is about an artifacts we build in the Quarkus tree (at least, that seemed like the common denominator for the huge ones) contains the whole set of managed dependencies for our BOM - and it's in memory for the time the Aether model is kept in memory.

Again, this is kept in session. Once you have graph, or tree, or classpath calculated by Resolver, you can toss session away.

I don't think we keep the Aether model in memory forever...

Unsure what is referred to as "Aether model", but if it is the DependencyNode (root and below the tree), you don't need session for that. In fact, it is usually the case to keep Resolver "object graph" but one may create session "per resolver invocation". This way you may get rid of excess heap use (again, as many things are cached/stored in session itself).

In maven, MavenSession (== repository system session), that in case of vanilla CLI == JVM runtime. But for example in mvnd, where process remains resident, Resolver (and Maven) object graph is retained, but session (Maven and Resolver) are re-recreated per builds.

Will dig deeper into Resolver use in Quarkus...

Also, could I have access granted to heap dumps?

cstamas commented 3 weeks ago

Thanks, looked at dumps. The memory hog is in session (RepositoryCache). For start, I'd recommend to test aether.dependencyCollector.pool.descriptor=weak and see does it change memory consumption (and neglect possible slow down for during experiment).

If it does, we know we are at right track.

gsmet commented 3 weeks ago

@cstamas Thanks for having a look. aether.dependencyCollector.pool.descriptor=weak seems to help (at least on that front, we have a few other culprits on our side).

I can share the dump with you. Either send me your Google account at gsmet at redhat dot com or a place to upload it for you. Keep in mind that this dump is obtained from a OOM error when building so it's really about trying to limit the peak usage of memory.

My initial idea with the patch above was to limit the number of managed dependencies to copy to only the ones that were important for a given descriptor. Meaning that in our case, all the managed dependencies are copied to quite some descriptors that don't really need it (or at least not all of them). But from what I could see, the descriptor dependencies don't contain the transitive ones so that won't work well.

cstamas commented 3 weeks ago

Yes, and as i mentioned, doing this on POM (removing "unused depmgt") is not feasible as POM lacks context where it will be used, hence no idea which one is "unused". Maybe we need to deconstruct poms and cache depmgt separately, as in this case all the Q1, Q2,.... would literally share the same one instance.

gsmet commented 3 weeks ago

While it's not a complete fix, I created a PR to improve the situation a bit: https://github.com/apache/maven-resolver/pull/534

cstamas commented 3 weeks ago

So, two interesting PRs (for Resolver 1.9.x):

Would be good, if someone could locally merge both, and test such built Resolver 1.9.x with Quarkus.

Re timeline: I am out next 2 weeks, and Resolver 1.9.21 was just released last week...But when am back, I could do Resolver 1.9.22 w/ memory improvements, and Maven 3.9.9 is already scheduled next (so early Aug or alike).

cstamas commented 3 weeks ago

Good news: the Maven IT suite is OK with change above: https://github.com/apache/maven/pull/1617

Bad news: interning all the List<Dependency> comes at a price: Quarkus build (mvn clean install -Dquickly) on my laptop went from 9:20 (vanilla Maven 3.9.8) to 11:10 (w/ patch above). We could maybe introduce some threshold, and intern dependency lists only if they are above some size limit? Or any other ideas?

I did not measure the most interesting question: heap usage change, but I'd bet in a six-pack of beer it is HUGE :smiley:

gsmet commented 3 weeks ago

I will have a look next week. Thanks!

gsmet commented 3 weeks ago

I can see how it could be a good thing memory wise. I’m a bit worried about the additional work though.

Especially since in most of the cases, we are doing the exact same thing over and over (typically the convert operation in ArtifactDescriptorReaderDelegate).

Multiplied by 2300 multiplied by the number of artifacts affected, that’s a lot of useless work.

cstamas commented 3 weeks ago

Agreed. But ArtifactDescriptorReader is not used only during collecting, but in many other places, plus, is exposed via RepositorySystem public API as well. Also, ArtifactDecriptorResult is public API as well, so we could change that only in Resolver 2.1.0 or so...

Biggest problem is that ArtifactDescriptorResult is NOT immutable. Hence, we cannot assume that something does not mutates it (like adds new dependency). Hence, current "contract" is that for each new request it hands over new result (that is instance on it's own and freely mutable). Use of reader in collector is somewhat "specific", and collector does handle artifact descriptors as immutable (does not mutate them), so I still see some room for improvement...

gsmet commented 2 weeks ago

@cstamas I created a small PR (for your 1.9 PR, the same patch should be applied to the master branch): https://github.com/cstamas/maven-resolver/pull/6 .

Locally, when I ran the Maven ITS, I got:

I think the way I reduce allocations by not copying the map might help getting a speed up.

It's not scientific but I think it would be worth pushing all this combined to the Apache Maven ITs and see if you get similar results.

gsmet commented 2 weeks ago

@cstamas and I can confirm the memory usage is greatly reduced. So if you end up confirming that all patches applied actually make things a bit faster, that would be a win on both sides.

cstamas commented 3 days ago

Back from vacation... but DK traveling still ahead. Am picking up this one and will hopefully be present in next Resolver (1/2) and Maven (3/4) releases that are about to happen "soon-ish".

gsmet commented 3 days ago

@cstamas maybe you could merge the two PRs for MRESOLVER-586 and rebase yours? This patch is an easy win. That way you could easily test the addition of all of them?

https://github.com/apache/maven-resolver/pull/534 https://github.com/apache/maven-resolver/pull/535

gsmet commented 3 days ago

@cstamas Glad to have you back! I'm around if you want to discuss this any further or need some additional help.

cstamas commented 3 days ago

Merged, rebased, Maven ITs running https://github.com/apache/maven/pull/1617

cstamas commented 3 days ago

Ok, so almost all merged, and MRESOLVER-587 looks good as well:

The only question here are defaults of two new config keys: for safety, am using false/false (so basically interning does NOT kick in, behaves as today) but is user configurable. As we saw, true/true tremendously lowers memory consumption at price of increased runtime. Still, as Guillaume measured (and those changes are picked up as well), there is some hope: https://github.com/quarkusio/quarkus/issues/38814#issuecomment-2230625819

Ultimate fix would be in ArtifactDescriptorReader and making descriptor immutable, but sadly that is not doable in scope of 1.x/2.0.x of Resolver, is probably to happen in 2.1.x Resolver....

WDYT? Any opinion?

gsmet commented 3 days ago

@cstamas your proposal of interning only managed dependencies by default makes sense to me.