Increased build-time RSS usage in `micrometer-prometheus` integration test after moving Kubernetes to @ConfigMapping

zakkak commented 2 weeks ago

Description

Since https://github.com/quarkusio/quarkus/pull/43862 the build-time RSS usage of native-image builds have increased in the micrometer-prometheus integration test (note that I haven't observed similar changes in other tests).

The change didn't have any negative effects to the native executable size and the reachable types/methods, on the contrary it decreased the total image size by ~400KB.

The purpose of this issue is to bring awareness of this increase and keep track of it.

[!NOTE]
https://github.com/quarkusio/quarkus-images/pull/282 was also merged on the 24th of October but I believe the issue is related to a Quarkus change since it popped up in the upstream oracle/graal nightly builds that are not using the quarkus builder images.

Note from @gsmet: here is another report posted by @mocenas here: that comes with a reproducer: https://github.com/quarkusio/quarkus/issues/44436

Compiling quarkus app into native executable using docker image, requires a lot more memory since quarkus is moving to @ConfigMapping.

We have an example where our testing app used to work with xmx=5g for native compilation a month ago. But since https://github.com/quarkusio/quarkus/pull/43862 and https://github.com/quarkusio/quarkus/pull/44140 requires at least 9g to successfully compile (otherwise compilation fails on OutOfMemory).

Increasing memory requirement to almost double, just by moving to @ConfigMapping is alarming.

At this moment it is somehow manageable, but with more planned in https://github.com/quarkusio/quarkus/pull/44079 it is hard to say, how much memory will it require at the end.

quarkus-bot[bot] commented 2 weeks ago

/cc @brunobat (micrometer), @ebullient (micrometer), @geoand (kubernetes), @iocanel (kubernetes), @radcortez (config)

michalvavrik commented 2 weeks ago

https://github.com/quarkusio/quarkus/issues/44220 was closed as duplicate, but our application cannot be build into native executable at all, while what this issue says is: The purpose of this issue is to bring awareness of this increase and keep track of it.

Are you sure this is acceptable solution? If on my workstation that is very strong I cannot build application anymore, is it appropriate action?

gsmet commented 2 weeks ago

From what I can see you have a Xmx build limit:

4.44GB of memory (14.3% of 31.10GB system memory, set via '-Xmx5g')

Try to raise that and see how it goes. Not saying it's not an issue that needs fixing or at least diagnosing.

@zakkak do you have any idea how we could understand what's going on?

michalvavrik commented 2 weeks ago

From what I can see you have a Xmx build limit:

4.44GB of memory (14.3% of 31.10GB system memory, set via '-Xmx5g')

Try to raise that and see how it goes.

Setting it to 7g did the trick. I'll try experiment with it if I can set even less. Thank you

michalvavrik commented 2 weeks ago

Setting quarkus.native.native-image-xmx to 6g failes over OOM.

zakkak commented 2 weeks ago

@zakkak do you have any idea how we could understand what's going on?

One would need to attach a Java profiler to native-image (it's a Java application) and see what's causing the additional allocations. Note that this might not be trivial through as the increased number of allocations could be the result of some new call-paths being explored by the analysis.

radcortez commented 1 week ago

I wonder if we were close to the limit, and the change tipped it off. Mapping do add an additional class per root and group.

michalvavrik commented 1 week ago

I could misunderstood this, but in case of the QE TS many-extensions module, it used to work with 5g and now 6g is not enough, so close is fuzzy term, you still may be right. BTW we had several other modules affected, I just had to raise 5g limit for quite simple Funqy serverless test in native from 4g.

radcortez commented 1 week ago

Are all the modules using the kubernetes extension?

michalvavrik commented 1 week ago

Yes, all of them use kubernetes or openshift extensions.

rsvoboda commented 3 days ago

Experiment on my MBP M3 with xmx set to 8g. Quarkus 3.16, 3.15, 3.14 can handle "many extensions" module from https://github.com/quarkus-qe/beefy-scenarios with peak RSS ~ 5GB. Quarkus main (https://github.com/quarkusio/quarkus/commit/7a6b0eea9b8) utilizes full available memory and spends massive time on GC activity

sdk use java 23.1.5.r21-mandrel
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=999-SNAPSHOT -Dquarkus-plugin.version=999-SNAPSHOT | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.16.2 -Dquarkus-plugin.version=3.16.2 | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.15.1 -Dquarkus-plugin.version=3.15.1 | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.14.4 -Dquarkus-plugin.version=3.14.4 | grep -e "Peak RSS" -e "Finished generating"

Using java version 23.1.5.r21-mandrel in this shell.
                       48.2s (45.7% of total time) in 279 GCs | Peak RSS: 7.47GB | CPU load: 8.25
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 45s.
                        5.2s (8.9% of total time) in 499 GCs | Peak RSS: 5.30GB | CPU load: 7.84
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 58.7s.
                       6.1s (10.1% of total time) in 936 GCs | Peak RSS: 4.93GB | CPU load: 7.81
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 0s.
                        5.7s (9.8% of total time) in 742 GCs | Peak RSS: 4.88GB | CPU load: 7.89
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 58.2s.

@maxandersen @cescoffier, could you please assign somebody to investigate more in detail what causes such memory demand when building a native image?

Set of defined extensions: https://github.com/quarkus-qe/beefy-scenarios/blob/main/003-quarkus-many-extensions/pom.xml

Side note: I also experimented with different versions of Mandrel (23.1.4.r21-mandrel, 23.1.5.r21-mandrel, 24.0.2.r22-mandrel, 24.1.1.r23-mandrel). I noticed a difference between 23.1.4.r21-mandrel and 23.1.5.r21-mandrel when the .4 version needed more memory and more GC cycles.

rsvoboda commented 3 days ago

The thing to investigate deeper is kubernetes extension as hinted in comments above.

Peak RSS goes to ~5 GB level with 999-SNAPSHOT when kubernetes extension is removed from 003-quarkus-many-extensions/pom.xml

sdk use java 23.1.5.r21-mandrel
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=999-SNAPSHOT -Dquarkus-plugin.version=999-SNAPSHOT | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.16.2 -Dquarkus-plugin.version=3.16.2 | grep -e "Peak RSS" -e "Finished generating"

Using java version 23.1.5.r21-mandrel in this shell.
                       7.8s (10.9% of total time) in 1372 GCs | Peak RSS: 4.95GB | CPU load: 7.88
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 11s.
                       7.0s (10.3% of total time) in 1264 GCs | Peak RSS: 4.48GB | CPU load: 7.80
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 7s.

gsmet commented 3 days ago

I think we will need some help from the Mandrel team here @zakkak @galderz @Karm as something in the new code seems to trigger an unwanted behavior of Mandrel/GraalVM but I think we will need some Mandrel/GraalVM knowledge to try to figure out what's going on.

radcortez commented 3 days ago

Something to consider is that we never noticed such behavior with other extensions using mappings, and we have already used them for some time. So, it may be something only related to the Kubernetes configuration.

I'm going to try to investigate, but I agree with @gsmet that it would be helpful if someone from Mandrel also helped.

zakkak commented 3 days ago

@radcortez I am happy to work with you on this. I have it in my TODO list anyway but need to find some free time for it. Let's sync on Zulip. Note: anyone else willing to jump in is very much welcome on my side :)

radcortez commented 3 days ago

Sounds good :)

franz1981 commented 3 days ago

@rsvoboda https://github.com/quarkus-qe/beefy-scenarios what it is and used for? I wasn't aware of this

rsvoboda commented 3 days ago

https://github.com/quarkus-qe/beefy-scenarios is test coverage that we built during RHBQ 1.x times (together with https://github.com/quarkus-qe/quarkus-openshift-test-suite), with RHBQ 2+ we moved to https://github.com/quarkus-qe/quarkus-test-suite (with https://github.com/quarkus-qe/quarkus-test-framework/) which helped us to simplify test development for both baremetal and OpenShift target environments.

https://github.com/quarkus-qe/beefy-scenarios is kept around for continuous community testing with Quarkus main, it's also part of ecosystem CI / https://status.quarkus.io.

radcortez commented 1 day ago

Ok, I think I've got something:

First, one thing to consider is that changing the config classes is not an issue directly. In fact, in this particular case, the config classes are available only at build time and are not even available during the native image build. So, what is the issue?

In our generated startup Config, we match the configuration to determine if it is build time or runtime to warn about possible issues. As you know, build-time configuration may be present in application.properties, but we have to ignore it. If we don't have the config classes because they are in the deployment module, we need to do it another way.

The old config implementation would generate a method similar to this, which either ignores keys or populates the config object:

    private static void siParseKey_quarkus_test_container/* $FF was: siParseKey:quarkus:test:container*/(SmallRyeConfig var0, NameIterator var1) {
        if (!var1.hasNext()) {
            String var2 = var1.getName();
            ((HashSet)unknown).add(var2);
        } else if (!var1.nextSegmentEquals("additional-exposed-ports")) {
            if (!var1.nextSegmentEquals("labels")) {
                if (!var1.nextSegmentEquals("network")) {
                    if (!var1.nextSegmentEquals("volume-mounts")) {
                        String var3 = var1.getName();
                        ((HashSet)unknown).add(var3);
                    } else {
                        var1.next();
                        siParseKey:quarkus:test:container:volume-mounts(var0, var1);
                    }
                } else {
                    var1.next();
                    emptyParseKey(var0, var1);
                }
            } else {
                var1.next();
                siParseKey:quarkus:test:container:labels(var0, var1);
            }
        } else {
            var1.next();
            siParseKey:quarkus:test:container:additional-exposed-ports(var0, var1);
        }
    }

The new implementation adds the names to a KeyMap to do the matching instead because it is SmallRyeConfig that populates the object; we only need to know if a build time property is matched or not and ignore it:

        KeyMap var0 = new KeyMap();
        var0.findOrAdd("quarkus.knative.rbac.roles.*.policy-rules.*.resource-names").putRootValue(true);
        var0.findOrAdd("quarkus.knative.init-containers.*.env-vars.*.value").putRootValue(true);
        var0.findOrAdd("quarkus.rest-client.proxy-address").putRootValue(true);
        var0.findOrAdd("quarkus.kubernetes.init-containers.*.mounts.*.name").putRootValue(true);
        var0.findOrAdd("quarkus.kubernetes.app-secret").putRootValue(true);
        var0.findOrAdd("quarkus.kubernetes.prometheus.port").putRootValue(true);
        ...

So, when config is moved from the old implementation to the new one, the population method is deleted, and new entries are added to the KeyMap.

I was already aware that KeyMap had some runtime performance issues. Still, those were not very noticeable because the number of entries was low. With the continuous move to the new implementation, the entries started to grow. Apparently, this also greatly impacts the memory of the image build (which I was not aware of, but due to the large number of Kubernetes config entries, it became very noticeable).

I already had a more lightweight replacement implemented in https://github.com/smallrye/smallrye-config/blob/3edebd4b38f85aaaf3c1872845a2add89b6374ad/implementation/src/main/java/io/smallrye/config/PropertyName.java, and replaced the KeyMap match in Quarkus with https://github.com/quarkusio/quarkus/pull/44448.

I was able to observe the memory increase by building the native image with the #44216, and I do see the memory decrease with #44448:

Current:

17.1s (13.0% of total time) in 178 GCs | Peak RSS: 8.05GB | CPU load: 10.12

With #44448:

9.3s (8.6% of total time) in 156 GCs | Peak RSS: 6.62GB | CPU load: 10.20

@zakkak @michalvavrik, can you try that and see if you notice any improvement? Thanks!

geoand commented 1 day ago

Very nice @radcortez!

zakkak commented 1 day ago

Great work @radcortez!

I confirm that the memory usage has decreased significantly

Before

After (with https://github.com/quarkusio/quarkus/pull/44448)

radcortez commented 1 day ago

Excellent.

I was now trying to compare to before (https://github.com/quarkusio/quarkus/issues/44216) to see if we are the same or if there is still some degradation. I do get some minor differences, but maybe that is only noise.

BTW, for additional reference, the implementation of https://github.com/quarkusio/quarkus/pull/44448, is something like:

   private static Set generateMappedProperties() {
      HashSet var0 = new HashSet();
      PropertyName var1 = new PropertyName("quarkus.config-tracking.file");
      var0.add(var1);
      PropertyName var2 = new PropertyName("quarkus.tls.*.trust-store.pem.certs");
      var0.add(var2);
      PropertyName var3 = new PropertyName("quarkus.log.socket.filter");
      var0.add(var3);
      PropertyName var4 = new PropertyName("quarkus.vertx.resolver.rotate-servers");
      var0.add(var4);
      PropertyName var5 = new PropertyName("quarkus.vertx.cluster.public-port");
      var0.add(var5);
      PropertyName var6 = new PropertyName("quarkus.log.file.async");
      var0.add(var6);
      PropertyName var7 = new PropertyName("quarkus.log.handler.syslog.*.syslog-type");
      ...

The check just calls a contains on the Set to match the name.

gsmet commented 1 day ago

And the Set is stored in the image heap for native image or you generate it even for native image?

radcortez commented 1 day ago

It is stored, because we need it for both phases.

I'm still unsure if this would be the best approach, (I'm still playing with it), because after we start and validate we don't need this anymore.

rsvoboda commented 1 day ago

Just for the record, Peak RSS is in the same level as 3.16.2 with https://github.com/quarkusio/quarkus/pull/44448 PR for the case mentioned in https://github.com/quarkusio/quarkus/issues/44216#issuecomment-2469879390 @michalvavrik no need to do additional verification

quarkusio / quarkus