Open holly-cummins opened 5 months ago
If you can publish a 3.6 heap dump and a 3.7 one, it probably would help to all work on the same baseline.
If you can publish a 3.6 heap dump and a 3.7 one, it probably would help to all work on the same baseline.
I've got the dumps, and will attach them, but I don't think I have the 'right' dump. I had two options for collection the dumps:
Dumps are here: https://drive.google.com/file/d/1gaGHo-qHu2GImdskJMiF-dNCocabGAfG/view?usp=sharing I've done 3.6 and 3.7, for manually triggered post-startup dumps in an unconstrained heap, and dumps triggered on OOM in a constrained heap. The constrained heap dumps are bigger.
Some of what I'm seeing in the dumps also looks similar to what Guillaume investigated in https://github.com/quarkusio/quarkus/issues/35280, so I think this issue may be in a similar domain to that one.
See also https://github.com/quarkusio/quarkus/issues/41156. Memory requirements went up again in Quarkus 3.12.
I had a look at the issue and here are my findings for main
.
The first thing is that the OOM is triggered by the build process, it has nothing to do with CL leaks.
Second thing is that there are two main culprits:
QuarkusClassLoader
's SharedOpenArchivePathTree
s- the cen
field is huge - we know this issue very well, there's nothing we can do about it if we want to keep these FS open.managedDependencies
for each io.quarkus
dependency. The Aether DataPool
is 34 MB large for a very very small project.I wonder if we could improve that. I don't fully understand why we would need to keep a list of the full managed dependencies tree once the artifact has been resolved - but I suppose there are very few projects with that many managed dependencies.
All of this is in the maven-resolver
so I think we would need to adjust things upstream.
We would probably need to discuss this with @aloubyansky .
That's for the first part of the work. Now I will have a look at a heap dump from an old version to see what could be the actual regression.
So from one I can see, it's just Quarkus getting generally fatter:
Execution
object has significantly more multis (more generated classes - and the large ones generated (Config, ArC) are a bit larger, more synthetic beans)DataPool
object is larger (34 vs 28) - but as I get an OOM in the middle of the build, it's not very scientific. What I can say is that we have 200 more managed dependencies in some of the io.quarkus
GoodDescriptor
instances (for French readers: so you have good descriptors and bad descriptors, the good descriptor sees something, it describes it, the bad descriptor sees something, it describes it but it's not the same
) and that really doesn't help.The main contributors to the main
BOM (I'm assuming most of the managed dependencies are coming from there - I limited myself to the ones with 10+ entries):
677 <groupId>io.quarkus</groupId>
245 <groupId>io.opentelemetry.javaagent.instrumentation</groupId>
182 <groupId>io.vertx</groupId>
131 <groupId>io.fabric8</groupId>
103 <groupId>io.smallrye.reactive</groupId>
73 <groupId>io.opentelemetry.instrumentation</groupId>
56 <groupId>org.testcontainers</groupId>
56 <groupId>io.netty</groupId>
43 <groupId>io.smallrye</groupId>
42 <groupId>io.grpc</groupId>
35 <groupId>org.wildfly.security</groupId>
35 <groupId>io.opentelemetry</groupId>
33 <groupId>org.jboss.resteasy</groupId>
30 <groupId>org.mvnpm</groupId>
30 <groupId>io.micrometer</groupId>
23 <groupId>org.jetbrains.kotlin</groupId>
19 <groupId>com.fasterxml.jackson.module</groupId>
18 <groupId>org.jetbrains.kotlinx</groupId>
18 <groupId>io.smallrye.stork</groupId>
16 <groupId>org.infinispan</groupId>
16 <groupId>org.hibernate.search</groupId>
16 <groupId>com.fasterxml.jackson.datatype</groupId>
13 <groupId>io.smallrye.common</groupId>
12 <groupId>org.junit.platform</groupId>
12 <groupId>org.apache.maven</groupId>
12 <groupId>io.dekorate</groupId>
11 <groupId>com.google.protobuf</groupId>
10 <groupId>io.quarkus.resteasy.reactive</groupId>
10 <groupId>io.opentelemetry.javaagent</groupId>
10 <groupId>com.google.http-client</groupId>
10 <groupId>com.fasterxml.jackson.dataformat</groupId>
For 3.6.1, that is the other version I tested:
568 <groupId>io.quarkus</groupId>
240 <groupId>io.opentelemetry.javaagent.instrumentation</groupId>
182 <groupId>io.vertx</groupId>
122 <groupId>io.fabric8</groupId>
95 <groupId>io.smallrye.reactive</groupId>
72 <groupId>io.opentelemetry.instrumentation</groupId>
54 <groupId>io.netty</groupId>
47 <groupId>org.testcontainers</groupId>
41 <groupId>io.smallrye</groupId>
40 <groupId>io.grpc</groupId>
36 <groupId>io.opentelemetry</groupId>
35 <groupId>org.wildfly.security</groupId>
33 <groupId>org.jboss.resteasy</groupId>
30 <groupId>io.micrometer</groupId>
23 <groupId>org.jetbrains.kotlin</groupId>
18 <groupId>org.jetbrains.kotlinx</groupId>
17 <groupId>io.smallrye.stork</groupId>
17 <groupId>com.fasterxml.jackson.module</groupId>
16 <groupId>org.infinispan</groupId>
16 <groupId>com.fasterxml.jackson.datatype</groupId>
12 <groupId>org.junit.platform</groupId>
12 <groupId>org.apache.maven</groupId>
12 <groupId>io.smallrye.common</groupId>
11 <groupId>io.dekorate</groupId>
11 <groupId>com.google.protobuf</groupId>
10 <groupId>io.quarkus.resteasy.reactive</groupId>
10 <groupId>io.opentelemetry.javaagent</groupId>
10 <groupId>com.google.http-client</groupId>
10 <groupId>com.fasterxml.jackson.dataformat</groupId>
From a quick analysis:
org.mvnpm
with 30 new entriesI think one thing we are learning with this is that keeping relocations around for ages has a cost so we need to drop the relocations at some point.
I think one thing we are learning with this is that keeping relocations around for ages has a cost so we need to drop the relocations at some point.
how big impact are they ? I would imagine fixing/improving how we deal with the full resolved set (which we shouldn't have to keep around forever) then relocations should be quite minimal/not a problem?
fixing/improving how we deal with the full resolved set (which we shouldn't have to keep around forever)
I don't think we keep the Aether model in memory forever but we need it at some point and we are using quite a huge amount of memory for it (at least compared to our footprint - it's 1/3 of the memory footprint when we build/start dev mode).
Now, I will discuss with Alexey if things can be improved in this area but I wouldn't be surprised if we were pushing things to the limits with our huge BOM and that it wasn't really designed/optimized for such a use case. Maybe we could find ways to improve the Maven resolver, we will see. I wouldn't hold my breath though.
My wild guess is that it's going to get even worse for a large project with a lot of Quarkus extensions but that's just a theory, from the heap dumps I checked.
Now I'm not advocating for us to drop relocations at a fast pace. I'm just saying that we didn't really think they have a cost and they have.
So I would rather have the following policy:
For instance, we could remove the Quarkus REST/Quarkus Messaging relocations introduced 4 months ago after the 3.15 release.
I'm definitely curious on how/why we would have to retain all around forever - wasn't designed/intended normal build should need to retain everything. lets see what @aloubyansky sees.
Again, we don't keep all around forever.
But each GoodDescriptor
that is about an artifacts we build in the Quarkus tree (at least, that seemed like the common denominator for the huge ones) contains the whole set of managed dependencies for our BOM - and it's in memory for the time the Aether model is kept in memory.
Yes I get that - but even that shouldn't be necessary.
We should probably distinguish between a "high tide" memory usage and a "steady state" memory usage. The symptom this work item was raised for was that our high tide memory usage was going up. Unloading things like the Aether memory model would/do help with steady state memory usage, but not with the high tide usage.
Big high tide memory usage could prevent people being able to build their Quarkus apps in constrained environments, but steady state memory usage is probably the metric that gets more attention/slides/blogs, etc.
Not sure if this is relevant here, but concurrency (particularly concurrent startup) can affect the "high tide" in ways that can be hard to predict.
but concurrency (particularly concurrent startup) can affect the "high tide" in ways that can be hard to predict
From what I have seen in the heap dumps, the problem we have here has little to do with concurrency.
So what makes the issue so problematic for us people with a huge dependency management is that each descriptor will have its own copy of the whole dependency management tree.
See:
And the convert()
method:
Given our whole dependency management tree is 800 KB, it grows very fast.
@cstamas I was wondering if you had an opinion on ^
This other comment is also useful for context: https://github.com/quarkusio/quarkus/issues/38814#issuecomment-2219911522 (the second point about the managed dependencies).
Does this mean that repository system session is kept alive, even after it is not needed? As data pool is stored within session...
So our problem here is about what @holly-cummins called "high tide" memory usage, meaning that during our build we actually need quite a lot of memory. We don't have that problem at runtime, we don't keep the session alive. We are trying to reduce the memory footprint of the Quarkus build itself.
In the case of Quarkus, what I'm mentioning above takes 34 MB of memory - which is not ideal, especially since AFAICS most of it is due to the management dependencies being duplicated for each descriptor.
Got it. There was a change in resolver in this respect (more speed at cost of memory) see https://issues.apache.org/jira/browse/MRESOLVER-320
In short: on this page: https://maven.apache.org/resolver-archives/resolver-1.9.21/configuration.html
Try out these switches:
-Daether.dependencyCollector.pool.descriptor=weak
also, try out BF collector, it also reduces memory usage, as it works less for same result:
-Daether.dependencyCollector.impl=bf
Basically, I had two ideas (that might both be really dumb as I'm not familiar with this area at all):
populateResult()
, would it make sense to only add the managed dependencies that are meaningful for this particular set of dependencies? Except if getDependencies()
doesn't return the transitive dependencies? What I'm not sure I understand is why copying the whole managed dependencies would make sense in the context of a single artifact?Just to be clear, these should be somehow passed to (and put into config properties of Resolver) around here, I guess: https://github.com/quarkusio/quarkus/blob/main/independent-projects/bootstrap/maven-resolver/src/main/java/io/quarkus/bootstrap/resolver/maven/BootstrapMavenContext.java#L452
Yeah, we might have to give it a try, I'll discuss with @aloubyansky when he's back from PTO, but as you probably suspect... we also want speed, especially in dev mode :).
Interested in your feedback on my comment above ^^.
Re feedback on your comment above, I see a specific "niche" here (am guessing from here, correct me if am wrong):
So what happens here is that assuming QRoot (shared parent POM) defines, let's say, 100 depMgt entries, and then quarkus modules Q1, Q2, etc all inherit QRoot parent, so Q1, Q2, Q3 will all have same 100 depMgt entries copied over and over again?
This may tackle maven model builder, as Resolver itself is "Maven-agnostic". Basically Q1, Q2... ArtifactDescriptorResult will cary same copies of depMgt as they all originate from QRoot POM.
Problem here is that Resolver deals with "built models", and as we know, Maven Model once built (by model builder) is "flat" (while Model Builder also caches quite extensively, but the notion of "parent POM" and "import POM" is gone once Model leaves Model builder).
This is an interesting problem, but TBH we never tinkered about it... will ask around other folks for opinions...
@cstamas yes, exactly. Except we have 2300 dependencies in the Quarkus BOM so you can imagine that it grows fast.
What puzzles me a bit is why you would need to copy the whole dependency management tree in each descriptor. I don't see the value of it (but again, I'm really not familiar with this part, it's Alexey who has all the knowledge about this). I could see how having the elements that apply to the dependencies of this particular artifact makes sense - especially since you were saying the descriptors are flat - but the whole thing?
I tried to play with this idea very naively - this approach is obviously extremely slow when you have 2k entries (it's actually quite noticeable...) - but I wanted to play with it to see if it would improve our situation. It does.
Now I'm not sure it's correct and things would need to be tweaked for it to not make things unacceptably slow.
diff --git a/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java b/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java
index edb417858..5b91f09b2 100644
--- a/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java
+++ b/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/ArtifactDescriptorReaderDelegate.java
@@ -56,14 +56,17 @@ public void populateResult(RepositorySystemSession session, ArtifactDescriptorRe
result.addRepository(ArtifactDescriptorUtils.toRemoteRepository(r));
}
+ DependencyManagement mgmt = model.getDependencyManagement();
+
for (org.apache.maven.model.Dependency dependency : model.getDependencies()) {
result.addDependency(convert(dependency, stereotypes));
- }
- DependencyManagement mgmt = model.getDependencyManagement();
- if (mgmt != null) {
- for (org.apache.maven.model.Dependency dependency : mgmt.getDependencies()) {
- result.addManagedDependency(convert(dependency, stereotypes));
+ if (mgmt != null) {
+ mgmt.getDependencies().stream()
+ .filter(d -> d.getGroupId().equals(dependency.getGroupId())
+ && d.getArtifactId().equals(dependency.getArtifactId()))
+ .findAny()
+ .ifPresent(d -> result.addManagedDependency(convert(d, stereotypes)));
}
}
Delegate is by the way, "pluggable": https://github.com/apache/maven/blob/e7e0cbb4053de3d9e83c82048875969d63918d68/maven-resolver-provider/src/main/java/org/apache/maven/repository/internal/DefaultArtifactDescriptorReader.java#L185
Just stick one instance into session.configProperties and resolver will use that.
So, the logic is: if on Node N1, if D1 was added as dependency, remove D1 from depMgt? This sounds interesting, let me push this change thru Maven ITs. At first this looks promising as:
But am afraid this is not so simple, let see IT results....
Wait, did I misread the diff...?
So, the "do not add D1 to depMgt if already added as dep" is here, let's see ITs: Maven4: https://github.com/apache/maven/pull/1613 Maven3: https://github.com/apache/maven/pull/1614
Removing "unused" depMgt on POM level is not possible, they are like "future pledge", and you don't know will and when will they be used (or may be unused as well). Same POM (built model) may be used in different places in graph, and at one place the D1 depMgt may affect something downstream of it, and at other place may not, as for example D1 parent (or parent parent, etc) may redefine depMgt ("and closer to root wins"). Hence, to me this seems not doable on POM level (ArtifactDescriptor level, as again, same descriptors are reused in different nodes in graph).
My PR is probably doing "way less" (removing way less) than your, but again, I find your diff wrong: you would remove depMgt entry that is not added as dep, if am reading your diff correctly.
I am not fully synced how resolver is used in Quarkus, but from reading some stuff, few ideas:
But each GoodDescriptor that is about an artifacts we build in the Quarkus tree (at least, that seemed like the common denominator for the huge ones) contains the whole set of managed dependencies for our BOM - and it's in memory for the time the Aether model is kept in memory.
Again, this is kept in session. Once you have graph, or tree, or classpath calculated by Resolver, you can toss session away.
I don't think we keep the Aether model in memory forever...
Unsure what is referred to as "Aether model", but if it is the DependencyNode (root and below the tree), you don't need session for that. In fact, it is usually the case to keep Resolver "object graph" but one may create session "per resolver invocation". This way you may get rid of excess heap use (again, as many things are cached/stored in session itself).
In maven, MavenSession (== repository system session), that in case of vanilla CLI == JVM runtime. But for example in mvnd, where process remains resident, Resolver (and Maven) object graph is retained, but session (Maven and Resolver) are re-recreated per builds.
Will dig deeper into Resolver use in Quarkus...
Also, could I have access granted to heap dumps?
Thanks, looked at dumps. The memory hog is in session (RepositoryCache). For start, I'd recommend to test aether.dependencyCollector.pool.descriptor=weak
and see does it change memory consumption (and neglect possible slow down for during experiment).
If it does, we know we are at right track.
@cstamas Thanks for having a look. aether.dependencyCollector.pool.descriptor=weak
seems to help (at least on that front, we have a few other culprits on our side).
I can share the dump with you. Either send me your Google account at gsmet at redhat dot com or a place to upload it for you. Keep in mind that this dump is obtained from a OOM error when building so it's really about trying to limit the peak usage of memory.
My initial idea with the patch above was to limit the number of managed dependencies to copy to only the ones that were important for a given descriptor. Meaning that in our case, all the managed dependencies are copied to quite some descriptors that don't really need it (or at least not all of them). But from what I could see, the descriptor dependencies don't contain the transitive ones so that won't work well.
Yes, and as i mentioned, doing this on POM (removing "unused depmgt") is not feasible as POM lacks context where it will be used, hence no idea which one is "unused". Maybe we need to deconstruct poms and cache depmgt separately, as in this case all the Q1, Q2,.... would literally share the same one instance.
While it's not a complete fix, I created a PR to improve the situation a bit: https://github.com/apache/maven-resolver/pull/534
So, two interesting PRs (for Resolver 1.9.x):
Would be good, if someone could locally merge both, and test such built Resolver 1.9.x with Quarkus.
Re timeline: I am out next 2 weeks, and Resolver 1.9.21 was just released last week...But when am back, I could do Resolver 1.9.22 w/ memory improvements, and Maven 3.9.9 is already scheduled next (so early Aug or alike).
Good news: the Maven IT suite is OK with change above: https://github.com/apache/maven/pull/1617
Bad news: interning all the List<Dependency>
comes at a price: Quarkus build (mvn clean install -Dquickly
) on my laptop went from 9:20 (vanilla Maven 3.9.8) to 11:10 (w/ patch above). We could maybe introduce some threshold, and intern dependency lists only if they are above some size limit? Or any other ideas?
I did not measure the most interesting question: heap usage change, but I'd bet in a six-pack of beer it is HUGE :smiley:
I will have a look next week. Thanks!
I can see how it could be a good thing memory wise. I’m a bit worried about the additional work though.
Especially since in most of the cases, we are doing the exact same thing over and over (typically the convert operation in ArtifactDescriptorReaderDelegate).
Multiplied by 2300 multiplied by the number of artifacts affected, that’s a lot of useless work.
Agreed. But ArtifactDescriptorReader
is not used only during collecting, but in many other places, plus, is exposed via RepositorySystem public API as well. Also, ArtifactDecriptorResult
is public API as well, so we could change that only in Resolver 2.1.0 or so...
Biggest problem is that ArtifactDescriptorResult
is NOT immutable. Hence, we cannot assume that something does not mutates it (like adds new dependency). Hence, current "contract" is that for each new request it hands over new result (that is instance on it's own and freely mutable). Use of reader in collector is somewhat "specific", and collector does handle artifact descriptors as immutable (does not mutate them), so I still see some room for improvement...
@cstamas I created a small PR (for your 1.9 PR, the same patch should be applied to the master
branch): https://github.com/cstamas/maven-resolver/pull/6 .
Locally, when I ran the Maven ITS, I got:
2m49s
2m52s
2m30s
I think the way I reduce allocations by not copying the map might help getting a speed up.
It's not scientific but I think it would be worth pushing all this combined to the Apache Maven ITs and see if you get similar results.
@cstamas and I can confirm the memory usage is greatly reduced. So if you end up confirming that all patches applied actually make things a bit faster, that would be a win on both sides.
Back from vacation... but DK traveling still ahead. Am picking up this one and will hopefully be present in next Resolver (1/2) and Maven (3/4) releases that are about to happen "soon-ish".
@cstamas maybe you could merge the two PRs for MRESOLVER-586 and rebase yours? This patch is an easy win. That way you could easily test the addition of all of them?
https://github.com/apache/maven-resolver/pull/534 https://github.com/apache/maven-resolver/pull/535
@cstamas Glad to have you back! I'm around if you want to discuss this any further or need some additional help.
Merged, rebased, Maven ITs running https://github.com/apache/maven/pull/1617
Ok, so almost all merged, and MRESOLVER-587 looks good as well:
The only question here are defaults of two new config keys: for safety, am using false/false (so basically interning does NOT kick in, behaves as today) but is user configurable. As we saw, true/true tremendously lowers memory consumption at price of increased runtime. Still, as Guillaume measured (and those changes are picked up as well), there is some hope: https://github.com/quarkusio/quarkus/issues/38814#issuecomment-2230625819
Ultimate fix would be in ArtifactDescriptorReader and making descriptor immutable, but sadly that is not doable in scope of 1.x/2.0.x of Resolver, is probably to happen in 2.1.x Resolver....
WDYT? Any opinion?
@cstamas your proposal of interning only managed dependencies by default makes sense to me.
Describe the bug
I have an extension test in the quarkus-pact project that started OOMing in the ecosystem CI recently. I've worked around the issue by increasing the memory available to the test, but we may want to investigate why the memory requirements have increased.
I can reproduce just by starting the Quarkus application with constrained memory (it fails to start). The application has two pact extensions in it, so it does more classloading than a simpler application. With Quarkus 3.6, the app starts, and with 3.7, it fails to start. I can force a failure on Quarkus 3.6 by dropping the available memory down to 110mb (from 128mb), so even with Quarkus 3.6, the memory usage was fairly high - 3.7 just happens to be the release that tipped it over the edge into failure.
The ecosystem CI caught the change: https://github.com/quarkiverse/quarkiverse/issues/94#issuecomment-1837901693. This is the CI run where the problem first appeared, which might help us identify a specific change: https://github.com/quarkiverse/quarkus-pact/actions/runs/7082294368
My initial guess is that this is related to classloading, but I don't have firm evidence. It could also be test execution. If I give the applications more memory, so that it doesn't OOM, and then trigger a dump once startup is finished, the 3.6 and 3.7 dumps both use about 81mb of memory, and look similar.
Expected behavior
No response
Actual behavior
No response
How to Reproduce?
I've attached a reproducer app. Unzip it, and then run
For more control, another way to reproduce is
mvn install
. It should pass.140m
to128m
, the default. (I had to override the default memory allocation from the parent test class to get the test to pass.)cd cross-extension-integration-tests && mvn clean install
should now show the failureOutput of
uname -a
orver
No response
Output of
java -version
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response