Closed rsvoboda closed 1 month ago
/cc @Karm (mandrel), @galderz (mandrel), @zakkak (mandrel,native-image)
I did some investigation on top of Quarkus main, my initial tips (OpenTelementry bump, SmallRye Config bump) didn't prove to be the case, after that I tries several custom build but I wasn't able to pin point a concrete commit causing the issues. It's more like accumulation of multiple changes.
Here is the dump of my notes (the goal is to get from 65MB to 59MB), I didn't add it to the description intentionally, it's already quite heavy:
mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=999-SNAPSHOT -Dstart-stop.iterations=25 | grep "AVG rssKb"
Quarkus main
2024-08-12 17:57:25.841 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 65532
Quarkus main without otel bump - https://github.com/quarkusio/quarkus/commit/b6f54993c43f955f2b20855b9c4d694af7c210f2
2024-08-12 18:02:20.956 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 65246
before MP Config update (HEAD detached at d7b84f6d01c) ~15.7.
https://github.com/quarkusio/quarkus/commit/78045ea67413518f7c77271d08a70d03d1fffbbd
2024-08-12 18:17:15.291 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 64582
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+609 ~14.7.
git checkout a8dfd70cc8c916d0f1e7e5c0eea21832e90c8785
HEAD is now at a8dfd70cc8c resources, not resource for path
2024-08-12 19:49:55.790 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 64613
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+609 ~10.7.
git checkout aaf2960b388115bfd3e6fd903c71df8f6c1a8ba1
HEAD is now at aaf2960b388 Merge pull request #41793 from turing85/feature/parmeters-in-compile-plugin
2024-08-12 19:53:24.538 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 63120
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+679 ~9.7.
git checkout 37b1fb98322230a9165105b146c978e36afa93bd
HEAD is now at 37b1fb98322 WebSockets Next: Dev UI fixes
2024-08-12 19:59:05.640 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 62891
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+714 ~8.7.
git checkout eecc1ae55a45042e1fcb1bca36d70f8ad845a085
Previous HEAD position was 37b1fb98322 WebSockets Next: Dev UI fixes
2024-08-12 20:03:20.918 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61257
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+709 ~6.7.
git checkout 72d8bf6840b5aaf30392a69c03e460c2670b9c96
Update telemetry-micrometer.adoc
2024-08-12 18:25:57.942 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61243
2024-08-12 20:06:16.126 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61376
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+779 ~3.7.
git checkout 0ae63d59354cfca3a32b4006be3ca4dcabeb9870
HEAD is now at 0ae63d59354 Support multiple recipe artifacts in quarkus update
2024-08-12 20:11:35.490 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61022
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+884 ~30.6.
git checkout b310b2699f2113276cf2f6a0ded72f7587f13693
HEAD is now at b310b2699f2 Bump ubi image to 1.19 for jib
2024-08-12 20:18:43.963 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61079
https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+954 ~26.6.
git checkout 872ffdcc60ac75a86c7e22258c9d3d1eb1ba2dba
HEAD is now at 872ffdcc60a Bump io.quarkus:quarkus-platform-bom-maven-plugin
2024-08-12 20:23:01.314 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 60974
RSS is decreasing slowly, so this leads me to the conclusion that it's more like accumulation of multiple changes that lead into such increased RSS I see between 3.11 and 3.13.
/cc @EricWittmann (openapi), @Karm (mandrel), @Ladicek (fault-tolerance,smallrye), @MikeEdgar (openapi), @Sanne (core), @aloubyansky (core), @brunobat (tracing), @cescoffier (mutiny,reactive-messaging), @ebullient (metrics), @galderz (mandrel), @gsmet (core), @jmartisk (graphql,health,metrics,smallrye), @jponge (mutiny), @ozangunalp (reactive-messaging), @phillip-kruger (graphql,openapi,smallrye), @radcortez (core,smallrye,tracing), @sberyozkin (jwt), @stuartwdouglas (core), @xstefank (health), @zakkak (mandrel,native-image)
cc @franz1981
This same trend is confirmed on the CI with stable Linux machines as well? IIRC the start stop test should run wild, without any max heap settings: what happen if you cap it to some value (eg twice the max heap used while running loose)?
In case it is, and the native image cmd doesn't reveal anything relevant (@zakkak could help there I think), if the additional weight come from progressively increasing amount of features, it's ok. But if we believe we could make things better or it is unacceptable, we can:
This same trend is confirmed on the CI with stable Linux machines as well?
That's perf lab only, we do not have bare metal machines in our lab.
What about JVM mode?
As mentioned in the description, JVM mode runs didn't reveal noticeable difference. It's native mode where the diff manifests. It can be the case that as JVM mode uses more memory the diff is not detected ...
@rsvoboda I mean; the description here made me think it was a macOS run, but the difference is still the same while moving to Linux, right?
Yes, @mjurc tried the reproducer commands on his Fedora machine and he saw similar trend, the absolute numbers were of course a bit different. Michal confirmed that RSS was higher for 3.13 comparing to 3.11.
@franz1981 don't know if JFR alloc events are in place (@roberttoyonaga ?), I worked on the old object event but that's more for tracking leaks and this doesn't smell like a leak right now.
@rsvoboda @mjurc if you run the test manually, can you see if the rss increase is due to startup and/or first request? again if you run the test manually you should get heap dumps after startup and after first request with the different versions and see if we can spot some differences there
One more thing to maybe try here is @roberttoyonaga NMT work in case the increases are not spotted in heap contents.
@franz1981 Ah yes we now have ObjectAllocationInNewTLAB
, definitely want to jfr that @rsvoboda @mjurc
Finally, @rsvoboda run through the steps in https://quarkus.io/blog/native-startup-rss-troubleshooting and see if you can get a better understanding of the cause. As we highlighted there, the way native handles memory can lead to bigger increases caused by small changes compared to jvm mode.
Yes Native Image JFR has both the ObjectAllocationInNewTLAB
event and the ObjectAllocationSample
event. These events sample object allocations that result in requiring a new TLAB. They record stacktraces, so you can generate flamegraphs in JMC too. ObjectAllocationSample
is basically the same as ObjectAllocationInNewTLAB
but it's throttled to reduce overhead. It should safe to enable that event in production, but ObjectAllocationInNewTLAB
has very high overhead.
yes, like Galder mentioned, you could also try "native memory tracking" to see where native allocation are happening off the "Java" heap. There's a blog post here: https://developers.redhat.com/articles/2024/05/21/native-memory-tracking-graalvm-native-image#
Thanks for the pointers. I will try to look more into it, but my week is quite packed before I go on pto next week.
@roberttoyonaga @rsvoboda We should mention that not all upstream features in master or 24.1 aren't yet in a released version of mandrel. So feature availability may vary. AFAIK, main mandrel version is 23.1 (mandrel for JDK 21) in quarkus.
Oh yes, that's true. ObjectAllocationSample
and native memory tracking are in the GraalVM EA build. But heap dumping and the ObjectAllocationInNewTLAB
JFR event will be available in mandrel for JDK 21.
Oh yes, that's true.
ObjectAllocationSample
and native memory tracking are in the GraalVM EA build.
FWIW, CE (or graalvm community builds) early access builds are here which we should be testing: https://github.com/graalvm/graalvm-ce-dev-builds/releases. They're based on OpenJDK and open source code.
I have tried reproducing this on my Fedora setup with both GraalVM CE 21.0.2 (from SDK man) and the default builder image (Mandrel 23.1.4.0) and the results are:
3.11.3.graal.txt:2024-08-20 16:26:28.538 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 72581
3.11.3.txt:2024-08-20 16:13:38.929 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 72692
3.13.2.graal.txt:2024-08-20 16:28:15.357 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 73427
3.13.2.txt:2024-08-20 16:11:12.395 INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 73282
Given that the image size increased by ~1MB I don't find the above increase in RSS usage surprising, but the reported (by @rsvoboda) RSS increase is quite larger than that :/
@mjurc when reproducing this did you notice similar increase to what @rsvoboda reports or a smaller one like the one in my measurements?
@rsvoboda what are the thresholds set to in the CI? Can you also share the actual numbers from these runs? The default in the repository seems to be quite high at 90000 kB.
Thanks @zakkak for checking; iirc that benchmark is NOT having any limit on memory or xms/xmx nor of number of cores; both things which could affect RSS in the way the allocations happen vs GC cycles (vs malloc arenas could be used) . Maybe would be beneficial to have both information while trying to reproduce it
@franz1981, how is the config-quickstart RSS?
@radcortez still on holiday and with no VPN access, @johnaohara can answer you...but last time I checked there was a super small increase
Thanks @zakkak for checking; iirc that benchmark is NOT having any limit on memory or xms/xmx nor of number of cores; both things which could affect RSS in the way the allocations happen vs GC cycles (vs malloc arenas could be used) . Maybe would be beneficial to have both information while trying to reproduce it
@franz1981 If I get it right it sets xmx to 96m, see https://github.com/quarkus-qe/quarkus-startstop/blob/643dadc30f810333f8a0c8ef846e7aaac53f9a7e/testsuite/src/it/java/io/quarkus/ts/startstop/utils/MvnCmds.java#L24, regarding the rest you are right it doesn't control them.
Uh, I have missed it (my memory is failing me, sorry :"/ I should have checked) - thanks to have verified it
I did some experiments with jfr and nmt recording with Quarkus MP app. As https://developers.redhat.com/articles/2024/05/21/native-memory-tracking-graalvm-native-image suggests, I used EA build of GraalVM.
Here is recording-nmt-3.11-3.13.zip with recording-nmt-3.11.3.jfr
and recording-nmt-3.13.2.jfr
.
GitHub allows only "known file extensions" and thus I had to zip the files.
Native binaries were produced from https://github.com/quarkus-qe/quarkus-startstop repo using this procedure:
export PATH=/Users/rsvoboda/Downloads/graalvm-jdk-24+10.1/Contents/Home/bin:$PATH
mvn clean package -f app-full-microprofile -Dnative -Dquarkus.native.additional-build-args='--enable-monitoring=heapdump\,jfr\,nmt' -Dquarkus.version=3.13.2
mv app-full-microprofile/target/quarkus-runner ~/Downloads/app-full-microprofile-nmt-3.13.2
mvn clean package -f app-full-microprofile -Dnative -Dquarkus.native.additional-build-args='--enable-monitoring=heapdump\,jfr\,nmt' -Dquarkus.version=3.11.3
mv app-full-microprofile/target/quarkus-runner ~/Downloads/app-full-microprofile-nmt-3.11.3
JFR was produced using:
get_jfr () {
$1 -XX:+FlightRecorder -XX:StartFlightRecording=filename=$2 &
APPLICATION_PID=$!
counter=0
until curl -s -f -o /dev/null "http://127.0.0.1:8080/q/health/ready"
do
sleep 0.001
counter=$((counter+1))
done
echo "Cycles: $counter"
echo "RSS: `ps -p $APPLICATION_PID -o rss=`"
sleep 0.5
echo "RSS after 0.5s: `ps -p $APPLICATION_PID -o rss=`"
kill -SIGTERM "$APPLICATION_PID"
wait "$APPLICATION_PID"
}
get_jfr ${HOME}/Downloads/app-full-microprofile-nmt-3.13.2 ${HOME}/Downloads/recording-nmt-3.13.2.jfr
get_jfr ${HOME}/Downloads/app-full-microprofile-nmt-3.11.3 ${HOME}/Downloads/recording-nmt-3.11.3.jfr
I noticed in that the flame graph under Event Browser - Java Application has some additional bars for 3.13 comparing to 3.11
At this stage I think I'm at my knowledge and investigation limits, any insight would be welcome. I shared my reproducer steps so anybody can play with this.
Hi @rsvoboda ! I was able to reproduce similar JFR files to you and saw a similar RSS increase ~4MB.
I changed your script slightly by adding -XX:+PrintNMTStatistics
to dump a native memory usage report. And I also supplied a custom JFR configuration (.jfc file) where I bumped up the ObjectAllocationSample
event throttling rate to 100/s and turned on the ObjectAllocationInNewTLAB
event.
So the command now became:
$1 -XX:+FlightRecorder -XX:StartFlightRecording=filename=$2,settings=quarkus-demo.jfc -XX:+PrintNMTStatistics
NMT results: 3.13, 3.11 Only 64KB more memory used in 3.13. Mainly due to JFR. So native memory usage is probably not the culprit.
Next, I looked at the ThreadAllocationStatistics
JFR event. I notice that the "main" thread is the biggest allocator by several orders of magnitude. It also looks like the "main" thread allocates 3-4 MB more in 3.13 than in 3.11. This approximately matches the overall RSS increase that we're seeing.
Next, I looked at the ObjectAllocationSample
JFR event. If I limit the flamegraphs to only show ObjectAllocationSample
events emitted from the "main" thread (to reduce noise a bit), I notice that SmallRyeConfigBuilder.build()
has significantly more samples in 3.13 than in 3.11. Each of those samples corresponds to a new TLAB of size 512KB (this is due to where the allocation sampling instrumentation is placed in SubstrateVM).
3.13 3.11
When I turn on the ObjectAllocationInNewTLAB
event I see that ~7 more TLABs are allocated in 3.13, than in 3.11.
Additionally, in the flamegraphs you show in your comment above, I think you are displaying all events, not just allocations. This means all events with stacktraces are represented in those graphs (ThreadStart, MonitorEnter, etc), so doesn't give a good picture of what's happening with allocations specifically.
So maybe the RSS increase has something to do with SmallRyeConfigBuilder
?
Thanks @roberttoyonaga for the analysis
So, few action points:
I will come back from vacation next week, but thanks for proceeding in this ❤️
I started having a quick look at that and I can add some more data.
From a CPU standpoint, the main contributors are:
ObjectMapper
RuntimeConfigBuilder
is quite slow as we have to initialize the secure random, I'm still wondering if we should find a way to generate it on demand)SmallRyeConfig.getPropertyNames()
is very costly on the first call - maybe it's a stupid idea but I wonder if we could generate some metadata at build time for our extensions that could be used instead of generating all the property names each time. I'm pretty sure that there are a lot of subtleties but I wonder if we could at least have some of the state generated once and for all for the config of our extensions and have a mechanism in SmallRyeConfig that would load this data if available. For extensions, we know that the SmallRye Config version will be the same at build and runtime (or people are looking for issues and it's definitely not supported). I know it's a crazy idea but maybe it will resonate in Roberto's mind :) /cc @radcortez SmallRyeConfig.getPropertyNames()
work twice: once in RuntimeConfigSetup
and the other time when Config
is class initialized.From an allocation perspective, the story is relatively similar (except for the UUID part).
So these are the two flamegraphs of where SmallRyeConfig.getPropertyNames()
is involved.
Here you have it very visible:
Here it's at the very left:
Then I had also a few other comments about SmallRye Config init:
I wonder if we should have a build time phase for SmallRye Config that would initialize information about:
Basically everything that we could get from the class path scanning given we are in a closed world. And the SmallRye Config bootstrap would take it into account and avoid some work when in the context of Quarkus.
@radcortez you would have to confirm if it makes some sense or if it's totally crazy.
From my perspective, the config is a critical part of infra and we have seen it being an important contributor to the bootstrap so it would be worth investing in optimizing this if we think it makes sense.
I would be willing to help.
This is very useful information, thanks!
I've long thought that we need to put some effort into optimizing the Observability stack. A low footprint framework like Framework should try hard to have production ready features like observability be as low footprint as possible. @brunobat do you think that we are in a good place in terms of features whereas it makes sense to start looking into the optimizations direction?
Nice analysis @gsmet did you used the "new" async profiler support I have added to the start stop test? Or it is captured manually?
They look as JVM mode ones
@radcortez @gsmet I'm sending a small patch (I got some cycles this weekend to have fun)
The simpler thing I see that is not right there, is a huge amount of HyphenateEnumConverter
converters allocated on the same enum class i.e. io.smallrye.openapi.api.OpenApiConfig$OperationIdStrategy
Just dropping them will reduce significatly the allocation pressure, see https://github.com/quarkusio/quarkus/pull/42745
@rsvoboda https://github.com/quarkusio/quarkus/pull/42745 should have improved enough - let me know if you could check if it's enough to close this. In my tests that expensive converter was allocated more than 200 times - using some wrongly sized HashMap and StringBuilder, which was making things even worse
@franz1981 I will check Quarkus main after (or during ;) ) the calls I have today.
@rsvoboda @gsmet now things seems a bit better but we still have few converters allocated vs the same class; i hope today to send another one-liner PR to fix such, but is just a minor
Georgios Andrianakis @.***> escreveu (sábado, 24/08/2024 à(s) 16:36):
This is very useful information, thanks!
I've long thought that we need to put some effort into optimizing the Observability stack. A low footprint framework like Framework should try hard to have production ready features like observability be as low footprint as possible. @brunobat https://github.com/brunobat do you think that we are in a good place in terms of features whereas it makes sense to start looking into the optimizations direction?
— Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/42506#issuecomment-2308433078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARV5TRNGFEP767FA4COTO3ZTCR7TAVCNFSM6AAAAABMOBP2DKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGQZTGMBXHA . You are receiving this because you were mentioned.Message ID: @.***>
I think we should burn some cycles doing performance work even if we are not feature complete. Just for the sake of measuring the impact of upgrades and new features...
We already have performance work done on the OTel tracing Stack, however, the results were not conclusive and no performance red flags were found in OTel... But Luis Barreiro did fix a bug on an ActiveMQ client because of it. Saying this, I've started a new repository to perform more targeted tests. See: https://github.com/brunobat/quarkus-observability-perf
I already have a meeting scheduled with @franz1981 https://github.com/franz1981 to go more in-depth into it.
Cheers! -- Bruno Baptista http://twitter.com/brunobat_
Cool, feel free to include me as well as I am interested very much in this
FYI @gsmet @brunobat https://github.com/quarkusio/quarkus/pull/42758
Not sure it's "right" - but looking at the impl it seems correct
I run my local experiments with Quarkus main, Quarkus 3.13 and Quarkus 3.13 + https://github.com/quarkusio/quarkus/pull/42745 PR
tl;dr: we got back to good (3.11 like) numbers on my machine
Quarkus 3.11.3: 61122 kB
Quarkus main: 61623 kB
Quarkus 3.13: 65646 kB
Quarkus 3.13 + https://github.com/quarkusio/quarkus/pull/42745 PR: 61465 kB
https://github.com/quarkusio/quarkus/pull/42745 removes line removes unused converter allocation. This line was introduced in 3.9 already with https://github.com/quarkusio/quarkus/pull/39196.
So we have that line in 3.11 (https://github.com/quarkusio/quarkus/blame/3.11/extensions/smallrye-openapi/runtime/src/main/java/io/quarkus/smallrye/openapi/runtime/OpenApiConfigMapping.java#L31) which is the base version used to identify the RSS difference reported in this issue.
So I went ahead and tried 3.11 branch without that one line in OpenApiConfigMapping and I got better results
Quarkus 3.13 + https://github.com/quarkusio/quarkus/pull/42745 PR: 61465 kB
Quarkus 3.11 + https://github.com/quarkusio/quarkus/pull/42745 PR: 58714 kB
It's like 4-5% better than the original 3.11.3 baseline. So there is definitely something else to look into ...
So I went ahead and tried 3.11 branch without that one line in OpenApiConfigMapping
Nice and well done @rsvoboda !: so it seems I've improved things enough to be on par, although not really fixing the actual cause (between 3.11 and 3.13) :+1:
So these are the two flamegraphs of where
SmallRyeConfig.getPropertyNames()
is involved.
Yes, this is expected. One is for the static init config, and the other is for the runtime config.
Unfortunately, I was already aware that it was an expensive operation. It was worse in the past; the names are currently being cached. The issue is with https://github.com/smallrye/smallrye-config/blob/df6ac6251e757f90eaf814c9c4f0044c9435849b/implementation/src/main/java/io/smallrye/config/ConfigSourceInterceptor.java#L54.
This allows each interceptor to rewrite the list of names, required for relocations, fallbacks, profiles, etc. Not only there a lot of String operations happening here, but also the Lists and Iterators allocation.
I have some ideas on how to improve this, but I have been avoiding this until it becomes a real problem because it will require breaking the API :)
I wonder if we should have a build time phase for SmallRye Config that would initialize information about:
We have been slowly doing that work since version 3.3 when we did some of the significant Config improvement changes:
And many other minor changes. The strategy has been mostly going for the low-hanging fruits / more considerable bottlenecks
- the property names around
Not sure if we can do something here. The list of property names are always going to be dependent of the sources available at each phase. The real bottleneck is not reading the property (it may be, if we are reading properties from a slow source, database for instance), but the name transformation and the collection allocation in the interceptors.
- the converters around - with their priorities
- the config sources
This mostly done already: https://github.com/quarkusio/quarkus/blob/3877f161524d23217a328cd5f2cc777daefa275e/core/deployment/src/main/java/io/quarkus/deployment/steps/ConfigGenerationBuildStep.java#L200-L285
There are still a few leftovers, like the Converter
@Priority
annotation. Also, there are cases where the user / library creates their own SmallRyeConfig
instance (for numerous reasons). I'm not sure if this is the case, but when that happens there is not much we can do about it.
@radcortez you would have to confirm if it makes some sense or if it's totally crazy.
Yes, it makes sense. We have been doing that work. Let me run some flamegraphs and I'll try to come up with a list of things to improve by priority / effort.
@rsvoboda #42745 should have improved enough - let me know if you could check if it's enough to close this. In my tests that expensive converter was allocated more than 200 times - using some wrongly sized HashMap and StringBuilder, which was making things even worse
Can you elaborate on the HashMap
and StringBuilder
problem?
The list of property names are always going to be dependent of the sources available at each phase. The real bottleneck is not reading the property (it may be, if we are reading properties from a slow source, database for instance), but the name transformation and the collection allocation in the interceptors.
So I agree with all this but my question is: don't you know which sources are available for each phase at build time?
Note that I know nothing about SmallRye Config so I'm asking dumb questions :).
There are still a few leftovers, like the Converter @Priority annotation.
That's cool we made progress on this and I know you did a lot of work on this already. Would still be cool to take a step back and see if we could tackle this once and for all. The best way to optimize code is to not execute it :).
Can you elaborate on the HashMap and StringBuilder problem?
Dunno about the HashMap
but I saw a wrongly size StringBuilder
in StringUtil#skewer()
. I actually tried to improve it and did some JMH benchmarks and it was slower with the right size... And consistently slower. I wanted to ping @franz1981 about this as it went against everything I would have expected.
We cannot always guess the size of it but we could at least have a better approximation than the default 16.
Yes, it makes sense. We have been doing that work. Let me run some flamegraphs and I'll try to come up with a list of things to improve by priority / effort.
The sense of my message was that maybe we shouldn't spend too much time doing micro-optimizations but take a step back if we could completely skip some work. Note that I have no idea if it's actually doable, just throwing crazy ideas at you. I'm very willing to discuss it with you if you come up with some ideas.
@rsvoboda can you easily drop some of the extensions in your app? Maybe that would be a good method to try to pinpoint what's going on. Because the OpenTelemetry stuff for instance has changed a lot and could also be the culprit.
If it's easy, I would try to drop some extension, see how it goes, then reintroduce it and drop the next.
I'm all for taking a step back and looking at more grand solutions, let's just keep in mind however that SR Config is super sensitive to regressions. Changing something is one area often leads to issues no one had foreseen, simply because SR Config is so central to both build time and runtime phases of Quarkus
The app is here: https://github.com/quarkus-qe/quarkus-startstop/tree/main/app-full-microprofile
I can try to trim the extensions to see if things can be more visible.
Btw, I already looked at otel bum, see https://github.com/quarkusio/quarkus/issues/42506#issuecomment-2286015021. There is Quarkus main without otel bump - https://github.com/quarkusio/quarkus/commit/b6f54993c43f955f2b20855b9c4d694af7c210f2
line, it didn't change RSS much.
@radcortez
Can you elaborate on the HashMap and StringBuilder problem?
Yep, the enum converter often requires just a single element map, which could use Map::of
avoiding the default entry size of HashMap
s. But https://github.com/quarkusio/quarkus/pull/42745 already improved things enough that not that much enum converters are created (just 2 likely or something similar)...
I did some experiments and the biggest diff I noticed is with resteasy-reactive-jackson
I have prepared branch for your https://github.com/rsvoboda/quarkus-startstop/tree/reactive-jackson-and-health Quarkus main I use is based on 60994bfa8d3
Command:
mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=999-SNAPSHOT -Dstart-stop.iterations=25 | grep "AVG RSS"
mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=3.11.3 -Dstart-stop.iterations=25 | grep "AVG RSS"
Results (first row is main, second row is 3.11.3)
quarkus-resteasy-reactive-jackson + quarkus-smallrye-health
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 39653
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 37689
quarkus-resteasy-reactive-jackson
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 38055
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 36221
We have simple REST app without Jackson dependency - https://github.com/rsvoboda/quarkus-startstop/blob/reactive-jackson-and-health/app-jakarta-rest-minimal/src/main/java/com/example/quarkus/HelloController.java. There is also noticeable diff:
mvn clean verify -pl testsuite -Dtest=StartStopTest#jakartaRESTMinimalNative -Dquarkus.version=999-SNAPSHOT -Dstart-stop.iterations=25 | grep "AVG RSS"
mvn clean verify -pl testsuite -Dtest=StartStopTest#jakartaRESTMinimalNative -Dquarkus.version=3.11.3 -Dstart-stop.iterations=25 | grep "AVG RSS"
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 37493
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 36109
I did more experiments with MP application and various extensions combinations but I unfortunately didn't eliminate quarkus-smallrye-openapi from the mix as Quarkus main contains https://github.com/quarkusio/quarkus/pull/42745 but 3.11 does those unnecessary initializations. I will do new checks without openapi dependency and post them tomorrow.
RSS is measured after invocation of the first array entry from https://github.com/rsvoboda/quarkus-startstop/blob/reactive-jackson-and-health/testsuite/src/it/java/io/quarkus/ts/startstop/utils/URLContent.java#L14
Adding new String[]{"http://localhost:8080/data/serialization/json/complex-dto", "Vive la résistance!"},
as the first line for FULL_MICROPROFILE enum triggers measurement with Jackson, the diff is again ~2MB
quarkus-resteasy-reactive-jackson + quarkus-smallrye-health + /data/serialization/json/complex-dto invocation
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 41492
INFO [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 39454
OT: now in main we could use quarkus.rest.jackson.optimization.enable-reflection-free-serializers=true
And see what happen (in theory should help)
@rsvoboda i see on the pom that the dep is named https://github.com/rsvoboda/quarkus-startstop/blob/reactive-jackson-and-health/app-full-microprofile%2Fpom.xml
Why not (as per https://quarkus.io/guides/rest-migration) name it (as with any other dep which need it) into quarkus-rest-jackson?
I remember that there was something in the quarkus startup which takes care of correctly detect the previous naming convention (see https://github.com/quarkusio/quarkus/pull/39444 for an improvement i have sent in that part)
I did that "switch back" intentionally few days ago - https://github.com/rsvoboda/quarkus-startstop/commit/414a99e454fa8248ac824c041c489552d9084780
Describe the bug
After recent update from Quarkus 3.11 to 3.13 in QE StartStopTS (https://github.com/quarkus-qe/quarkus-startstop) we noticed RSS usage increase / failures due to threshold hit in native mode for application using MicroProfile / SmallRye extensions.
I was running my experiments on local Apple MBP M3 Pro, limiting running applications to the minimum, using 21.0.2-graalce. JVM mode runs didn't reveal noticeable difference
I did checks on more version, but these are the key ones:
These numbers are relevant to my system, you may see slightly different values when running the reproducer steps. But the key thing is that there is trend in increase of RSS usage with the same app after first successful request. The issue manifests in native mode, but imho it's not related to native-image tooling. Just FYI, I notice that RSS was slightly smaller when using Mandrel (23.1.4.r21-mandrel) to build the binary.
There is a slight increase between 3.11.3 and 3.12.3, difference between 3.11.3 / 3.12.3 and 3.13.2 is more noticeable, ~ 5-7 %.
Some details from native-image command: 3.13.2
3.12.3
3.11.3
Expected behavior
RSS usage stays similar between Quarkus 3.11 and 3.13 in MP application
Actual behavior
RSS usage increase between Quarkus 3.11 and 3.13 in MP application
How to Reproduce?
Output of
uname -a
orver
macOS
Output of
java -version
Java 21
Quarkus version or git rev
3.13.2
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response