Closed zakkak closed 9 months ago
/cc @Karm (mandrel), @ebullient (metrics), @galderz (mandrel), @jmartisk (metrics)
This doesn't look pretty:
- "peak_rss_bytes": 6114426880
+ "peak_rss_bytes": 8380002304
but are we sure they are solely due to Clément's PR?
but are we sure they are solely due to Clément's PR?
That was the only difference between the two runs (I tested with 31fcfc1088363cabb618a5af7f90038a160961bf and fd46a61cc74f10fbadd01737c9e05422f672a727), so I would say yes. Note however that the peak RSS is going up and down between different builds and it's not related to the PR. I am running on a machine with 62GB of RAM with no constraints on the max heap size, so the GC is free to use as much RAM as it wants according to its heuristics which are not deterministic.
For instance running with 31fcfc1088363cabb618a5af7f90038a160961bf 3 times I get:
(Notice that the last run has almost double the peak RSS of the first run without any chances in the code.)
@zakkak could we either raise the threshold or disable the test for now? It's very annoying to have all our builds failing because of this.
I guess the less evil would be to adjust the thresholds and keep this issue open until we verify the new thresholds are OK. This way we won't risk missing another increase due to other changes.
Note that the best would be to not have merged https://github.com/quarkusio/quarkus/pull/38820 in the first place without handling this since the test was already failing in the PR.
The PR in question does not seem related. I will have another look, but basically it is just tests.
My guess is that the true cause is the hot reload which must keep a few objects in memory, including the content of the certificates. Unfortunately, there is no solution around that.
That being said, this only happens if the certificate hot reload is enabled, which is not the case in this test. But there is a few more static methods and fields that could explain it (even if they should be very small when hot reload is not enabled).
Ok, to it's a build time RSS increase - not runtime. It makes more sense.
How can I produce the reports from above? The PR introduces an early usage of the Mutiny API and Vert.x scheduled tasks (which then uses Netty underneath). That may explain it.
I'm confused - but I may not understand what that test is actually doing.
I get:
ImageMetricsITCase.verifyImageMetrics:15 Expected analysis_results.types.jni to be within range [63 +- 1%] but was 61 ==> expected:
Seems related to JNI? It seems I'm under the previous value (isn't that a good thing?)
So.... with a more recent version of Graalvm, I got something more expected.
So, I'm unable to reproduce it locally with:
native-image 21 2023-09-19
GraalVM Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
Substrate VM GraalVM CE 21+35.1 (build 21+35, serial gc)
I can try to remove the usage of Mutiny, but it's somewhat weird, as I would need to use CS/CF, which are WAY more error-prone.
How can I produce the reports from above?
They are generated when building the native image in a file called quarkus-integration-test-jpa-postgresql-withxml-999-SNAPSHOT-runner-build-output-stats.json
I used:
./mvnw -Dnative -pl integration-tests/jpa-postgresql-withxml clean package -Dnative.surefire.skip -Dquarkus.native.container-build=true -Dquarkus.native.container-runtime=podman
So.... with a more recent version of Graalvm, I got something more expected.
Please use the jdk-21
mandrel builder image as the thresholds are version specific.
Ok, I know get the same exception as you do. So, it's not a problem with GraalVM CE, but only mandrel (or the in container build using Mandrel). That will not be easy to understand.
The removal of Mutiny does not help. So, it's something else.
@zakkak Ok, got the test to pass:
I removed: Mutiny usage (sad), a record (useless anyway), and java stream usage.
@zakkak See https://github.com/quarkusio/quarkus/pull/39026
So, it's not a problem with GraalVM CE, but only mandrel (or the in container build using Mandrel). That will not be easy to understand.
That's because Mandrel uses a different JDK as a base (OpenJDK vs LabsJDK) so there are some more differences in the json, the thresholds are meant to test the Quarkus' default builder image (i.e. Mandrel at the moment) however.
@zakkak See https://github.com/quarkusio/quarkus/pull/39026
✅
Describe the bug
jpa-postgressql-withxml
fails inImageMetricsITCase.verifyImageMetrics
testSee https://github.com/quarkusio/quarkus/actions/runs/7985178432/job/21803563937#step:16:776
The regression was introduced in https://github.com/quarkusio/quarkus/pull/38820
Note: As the changes are in vertx-http this probably affects other tests as well, but the impact doesn't happen to be big enough to trigger a test failure for them.
Expected behavior
The test should pass.
Actual behavior
The test fails with:
How to Reproduce?
Output of
uname -a
orver
No response
Output of
java -version
No response
Mandrel or GraalVM version (if different from Java)
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
After https://github.com/quarkusio/quarkus/pull/38820 I see the following differences:
@cescoffier do the above diffs looks reasonable according to your changes?