Closed Karthikrao121 closed 6 months ago
Need migration guide for performance enhancements on 3.2.4.Final.
I think we would first need a lot more data :).
The list of extensions would be a good start. For instance, we know that Hibernate ORM 6 comes with more classes that need to be initialized and causes a slow down.
Then I would continue by checking what's happening in JVM mode and if it is significantly slower as well, you could have a look at https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md to profile it.
Then we might want to actually profile the native executable but I'm not sure how to do that. @zakkak would know if it's possible.
Then we might want to actually profile the native executable but I'm not sure how to do that. @zakkak would know if it's possible.
That's definitely possible, but as you noted it should be the last step :)
You can see some examples in https://quarkus.io/guides/native-reference#inspecting-and-debugging
Furthermore, @galderz and @roberttoyonaga recently wrote a couple of great guides related to profiling native executables:
These are from 2.15.1.Final Installed features: [amazon-dynamodb, amazon-lambda, amazon-s3, amazon-secretsmanager, amazon-ssm, cdi, logging-json, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, security, smallrye-context-propagation, smallrye-openapi, vertx]" These are from 3.2.4.Final "Installed features: [amazon-dynamodb, amazon-lambda, amazon-s3, amazon-secretsmanager, amazon-ssm, cdi, logging-json, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, security, smallrye-context-propagation, vertx]"
@Karthikrao121 have you tried some of suggestions above to trouble shoot where changes in performance occur? are you only seeing startup changes or also changes in memory and throughput runtime performance?
@maxandersen memory consumption also increased on 3.2.4.Final. on 2.15.Final Memory Size: 128 MB Max Memory Used: 116 MB startup times are around 400 ms to 500 ms on 3.2.4.Final Memory Size: 256 MB Max Memory Used: 154 MB even with memory increase startup times are consistently over 600ms
@Karthikrao121
For the startup time, have a look at https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md , you should be able to get a CPU profile that would be a good start. Get one for both versions so that we can compare.
For the memory, it would be helpful if you could get a heap dump.
There's a good chance you won't want to make these public but if you're comfortable sharing them with me (and they don't contain sensitive information), you can send me an email with a download link at gsmet at redhat dot com.
The other option is that you have a look yourself and share your insights.
Thanks.
@gsmet please provide your insights on the shared CPU profile details.
@franz1981 do we have any resources to help @Karthikrao121 answer the question above? I can obviously point to the AsyncProfiler repo, but I am wondering if we have any of our own resources about using it :)
Sadly we don't have any resources re quarkus for async profiler, because it's a kind of "moving target".
Before using async profiler we need to understand first if they can observe the same behaviour in JVM mode. Regardless, If they are running in a container or not, and how many resources they dedicate to it. The latter part is very important, because an increased allocation pressure can pass nearly unnoticed with enough available memory (GCs works like that); in short: if they have lessen constrains (cpu/mem) things get better? If yes, which resource impact the most? That would narrow If is just a memory related problem (as it seems).
As suggest by @gsmet If they could collect profiling data and/or use the method from @galderz to check if more (native) allocations are happening and where, on native mode, it would be great/perfect. If they can provide collapsed/folded profiling data of the 2 versions, flamegraphs allow to easily create diff flamegraphs for comparison, which are pretty handy (see https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html).
If they cannot troubleshoot in native we have 2 other ways:
I did provide the async cpu profiles from both versions to @gsmet. what we observed is if we increase the memory and cpu the start up times for 3.2.4.Final are as same as 2.15.1.Final. But 2.15.1.Final is doing faster start up times at 128 MB where as 3.2.4.Final is doing the same start up times at around 700 MB to 1024 MB.
Which seems to suggest:
I will stop my wild guesses and waiting the @gsmet analysis
Thanks for sharing the data with him already
But 2.15.1.Final is doing faster start up times at 128 MB where as 3.2.4.Final is doing the same start up times at around 700 MB to 1024 MB.
What are these memory sizes you mention? -Xmx
values? Container memory size limits?
Also, can you post here what you sent to @gsmet?
These are memory allocations on AWS lambda.
@Karthikrao121 sorry, I just found your email in my spam folder after my PTO. I think what you sent is dev mode profiling which makes it hard to pinpoint potential slow downs in your app itself. Also given you found the regression with 3.2.4.Final, could you stick to this version? We have a known regression in 3.3.0 so that will make it harder to track what you noticed in 3.2.4 already.
What we would need is this one: https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md#profiling-application-startup-with-async-profiler , profiling startup by running the application jar.
Send me a flamegraph with 2.x and the same app with 3.2.4.Final.
Thanks!
@gsmet still on PTO but to save troubles due to different architecture they can use https://github.com/jvm-profiling-tools/ap-loader which pack different binaries in. In addition I suggest to produce jfr output which allow to extract much more info via jfr2flame conversion eg --classify, --lines and others... @gsmet we can have a quick call in 2 days when I am back too
Let's start simple first. And if we need additional instructions in TROUBLESHOOTING.md, we should work on it and make them as easy to follow as possible.
And if we need additional instructions in TROUBLESHOOTING.md, we should work on it and make them as easy to follow as possible.
+1
Yeah, my point is to save users to capture data over and over: If they capture the data in jfr format we can extract different types of flamegraphs ourselves, without requiring the users to collect more...
@Karthikrao121 any update?
@radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.
Additionally, in our https://quarkus.io/guides/performance-measure we explained that startup time is for us what makes the application available to users, which imply awaiting the first request completion. This, compared to what quarkus print as startup time, include few more class loading and initializations which can affect the final numbers: did you perform the same measurements in your startup time tests? I am saying that, because this is likely the measure you really are interested in.
@radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.
Yes, we have made a bunch of improvements to both start time and RSS in both JVM and Native mode. A few of these improvements are already available in 3.4.x, but the big chunk should only be available in 3.5.x.
Gsmet,
The startup time decreased with 3.4.1 version around 100 ms . Sorry for the late reply. We have seen some improvements with the newer version.
On Mon, Sep 25, 2023, 04:26 Roberto Cortez @.***> wrote:
@radcortez https://github.com/radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet https://github.com/gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.
Yes, we have made a bunch of improvements to both start time and RSS in both JVM and Native mode. A few of these improvements are already available in 3.4.x, but the big chunk should only be available in 3.5.x.
— Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/35392#issuecomment-1733274739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXWFT54OLZALHKER4Z3GPTX4FE5ZANCNFSM6AAAAAA3TT4OFI . You are receiving this because you were mentioned.Message ID: @.***>
Gsmet, The startup time decreased with 3.4.1 version around 100 ms . Sorry for the late reply. We have seen some improvements with the newer version. …
Good to hear; thanks for lettings us know.
Some additional improvements are coming in https://github.com/quarkusio/quarkus/pull/35322
I believe this can be closed due to the improvements done in:
Description
We recently migrated to the latest version of quarkus. After the upgrade to the latest version we are seeing slower startup times compared to the 2.15.1.Final version.
native (powered by Quarkus 2.15.1.Final) started in 0.308s. native (powered by Quarkus 3.2.4.Final) started in 0.423s.
Need migration guide for performance enhancements on 3.2.4.Final.
Implementation ideas
No response