Open viniciusxyz opened 10 months ago
Hi @viniciusxyz
as already mentioned in the previous ticket,
there are some open issues around the management interfaces where we still need some work.
We are aware of that, but this is currently no priority for us to fix.
Maybe this is something @roberttoyonaga likes to look into?
Maybe this is something @roberttoyonaga likes to look into?
Hi @fniephaus. I'm happy to look into this eventually, if nobody else picks this up. However, it probably wont be on my to-do list for some time.
I'm happy to look into this eventually, if nobody else picks this up
Cool, thanks!
However, it probably wont be on my to-do list for some time.
Typo? 😆 We're not going to complain if it's done asap
Typo? 😆 We're not going to complain if it's done asap
oops 😆 I mean it will probably remain on my to-do list for a while
Hi @viniciusxyz
as already mentioned in the previous ticket,
there are some open issues around the management interfaces where we still need some work.
We are aware of that, but this is currently no priority for us to fix.
I understand this point perfectly, I just found it strange to mark it as complete, even though it is a backlog item I believe it should be maintained because problems with GC happen quite frequently and losing traceability on this is a current negative point of native compilation, but I completely understand not be an immediate priority
🙋 we're also interested in having that added.
In the meantime, did anybody already reach out to Micrometer?
It looks like this can be detected: the current GraalVM NotificationEmitter
does not support any notification types, especially not the one Micrometer filters for. They could detect and log that as they already do if ManagementFactory.getMemoryPoolMXBeans()
returns an empty list.
Update: I've now created a PR to detect and log that: https://github.com/micrometer-metrics/micrometer/pull/5149
🙋 we're also interested in having that added.
In the meantime, did anybody already reach out to Micrometer? It looks like this can be detected: the current GraalVM
NotificationEmitter
does not support any notification types, especially not the one Micrometer filters for. They could detect and log that as they already do ifManagementFactory.getMemoryPoolMXBeans()
returns an empty list.
Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there
Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there
Well, it will not really help with the issue that there will be no metrics. But you'd at least get a log message why. I was specifically referring to this code in JvmGcMetrics.
Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there
Well, it will not really help with the issue that there will be no metrics. But you'd at least get a log message why. I was specifically referring to this code in JvmGcMetrics.
I understood. I thought there was some fallback, but if there really isn't, then calling the micrometer team probably won't help much since the problem is the lack of notifications in the native compilation.
It's quite frustrating to have to make monitoring worse to take advantage of the benefits of native images, but this is a necessary choice at the moment as far as I understand.
Hi @viniciusxyz and @codesimplicity ! I've submitted a PR adding support for GC notifications (only for serial GC for now). If you have time, please let me know if this solution works for your use case. Thanks! https://github.com/oracle/graal/pull/9799
@roberttoyonaga As soon as possible I will try to understand how the process of compiling graalvm works locally to test its implementation and I will report back. I really appreciate the work.
@roberttoyonaga As soon as possible I will try to understand how the process of compiling graalvm works locally to test its implementation and I will report back. I really appreciate the work.
No problem! Here are some brief steps to get you started with building GraalVM:
You'll need mx. And the latest labsjdk release.
Put mx on the path and set java home to labsjdk:
export PATH=/path/to/mx:$PATH
export JAVA_HOME=/path/to/labsjdk
Navigate to the vm/README.md directory for instructions on how to use mx build
command to build GraalVM with the substratevm component.
Once GraalVM is built, you can use mx native-image
to create native image executables or just find the native-image binary in the build directory. You'll also be able to find a link to the latest build in graal/vm/latest_graalvm_home
The first few minutes of this video does a good job of explaining this as well: https://youtu.be/3Gh0cz3vjG8?feature=shared&t=202
@roberttoyonaga Unfortunately I had some problems getting this to work on Windows, I'll try again tomorrow using Linux
@roberttoyonaga I'm having some problems getting the test done, I can compile the graal using mx --env ce build, but from what I understand from the documentation it doesn't have the native-image tool and the mx native-image apparently does not exist, I appealed and tried to run a mx build --all, but I still have the same problem, can you please guide me on how to create the native-image binary?
@viniciusxyz maybe try mx --dynamicimports /substratevm build
from the vm directory. And then try mx build
from the substratevm directory.
@roberttoyonaga By following the steps I managed to compile it, but the result was the same as opening the issue, I'm using the right branch as you can see below:
But the result of the compilation is still using scavenger, probably I must still be doing something wrong, but I don't know what...
The steps I did were:
export PATH=/home/vvsantos/mx:$PATH
export JAVA_HOME=/home/vvsantos/labsjdk-ce-24-jvmci-b01-debug
mx --dynamicimports /substratevm build
cd ../substratevm
mx build
javac MainNotificationEmitter.java
mx native-image MainNotificationEmitter
./mainnotificationemitter
The class used for testing was this
import com.sun.management.GarbageCollectionNotificationInfo;
import javax.management.NotificationEmitter;
import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
public class MainNotificationEmitter {
public static void main(String[] args) throws InterruptedException {
for (GarbageCollectorMXBean gcBean : ManagementFactory.getGarbageCollectorMXBeans()) {
if (!(gcBean instanceof NotificationEmitter notificationEmitter)) {
continue;
}
System.out.println("NotificationEmitter " + gcBean.getName());
notificationEmitter.addNotificationListener((notification, handback) -> {
var type = notification.getType();
System.out.printf("Notification issued -> %s\n", type);
}, notification -> notification.getType()
.equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION), null);
}
System.out.println("Notifier added");
System.gc();
Thread.sleep(3_000);
System.out.println("End");
}
}
Can you identify if I'm doing something wrong in this flow?
Hi @viniciusxyz you need to use the build options shown in the PR description when you execute the Native Image build.
In short, you need to use mx native-image --enable-monitoring=gcnotif MainNotificationEmitter
. Everything else you are doing looks correct.
As a side note: To use any of the monitoring/serviceability features in Native Image, you need to pass --enable-monitoring
at build time. For example, --enable-monitoring=jfr,nmt,heapdump,gcnotif
or --enable-monitoring==all
.
If you have any other questions, please don't hesitate to ask!
@roberttoyonaga I read this yesterday and I had already forgotten about it today :) I managed to test it, I just found it strange that there was an extra notification after the main code had finished executing
When we build with mx, is the native-image binary available in the labsjdk bin? I ask because I would like to do a complete test by compiling an application using maven and reachability-metadata (I need it to compile some things) and uploading it to Kubernetes to validate if the micrometer metrics are working correctly with this implementation
I just found it strange that there was an extra notification after the main code had finished executing
Oh yes, that's a small bug. When the system gets teared-down in emits another notification. I'll fix that and push an update.
When we build with mx, is the native-image binary available in the labsjdk bin?
No its not in labsjdk. You can find the latest build in the graal repository here: graal/vm/latest_graalvm_home The latest native-image tool is here: graal/vm/latest_graalvm_home/bin/native-image
@roberttoyonaga I started the integrated tests, basically I have an application that uses micronaut in conjunction with micrometer that uses jvm metrics to launch metrics for prometheus, initially notificationEmmiter seems to be working well since now I can see the pauses associated with the GC, strangely the metrics related to HEAP memory broke, but perhaps it has nothing to do with the change you made, another point that I found strange was the GC pauses that only increased more and more, I will start the application with the GC monitoring flags and compare the data to validate whether they are correct, I will give you a more complete feedback as soon as possible. Below is the image with the GC time metric that is super important for us and that now exists with its change :)
Hi @viniciusxyz
strangely the metrics related to HEAP memory broke
What do you mean by this? Did the notifications stop containing the memory usage info? Or did the memory usages reported look strange?
I made a very simple test app here: https://gist.github.com/roberttoyonaga/44bfe8cbaa809c102fa5d9cc959f2997 Using it, it looks like the memory usages reported are normal. Also I don't see that the GC durations are increasing.
@roberttoyonaga I don't know if it was your change, but there seems to have been a change in behavior in the memory metrics related to MemoryPoolMXBean, the screenshot below shows the behavior in the jdk 21 lts that I generally use:
Jdk 24 with your PR:
Code for test:
https://gist.github.com/viniciusxyz/c0883ec5a499351225bebf9a8ccbc5ea
What I'm doing for now is using memory metrics that are provided by micrometer so I'm testing the functioning of the resources necessary for these 3 classes to work:
JvmHeapPressureMetrics JvmGcMetrics JvmMemoryMetrics
As for measuring GC times, everything really seems to be ok, I made a comparison using -XX:+PrintGC and there doesn't seem to be anything abnormal
@roberttoyonaga I did a new test and it seems that this metric has disappeared and has no relation to your commit, checking out the previous commit and redoing the build I noticed the same problem, now why does this work in Java 21, but in Java 24 it is like this I don't know. As for the GC metrics themselves, everything seems to be correct.
Test with commit a8efbf7c2981
Hi @viniciusxyz this is actually expected. Please see the comment here.
The max sizes are always -1 in JDK 24 becuase of this PR https://github.com/oracle/graal/pull/6930 Essentially, the GC only begins collections after the max size has been exceeded. So it's not safe to rely on the max sizes never being exceeded. That's why the max sizes are now always -1.
@roberttoyonaga I understand, there are some graphs we use that use this maximum to be able to demonstrate in percentage terms the amount of heap used at the moment, but this can be removed without any problems, thank you for the clarification
A favor I ask of the team
This is actually the second time I have opened this issue, the first was #7803 opened on November 11, 2023, but it was closed maybe because I couldn't explain it clearly, I ask that you please be aware that this issue does not refer to compilation problems and yes, the lack of data in the NotificationEmitter at runtime to collect GC metrics, this occurs in both G1GC and SerialGC in both GraalVM CE and Oracle GraalVM
Describe the problem
I'm developing applications and I'm missing some statistics when using Micrometer, being more specific the details and duration of GC pauses, I started validating the lib code and noticed that the problem is actually that it seems that NotificationEmitter is not working launching the notification events and this only happens in the native image, these metrics are very important for us to be able to put the applications into production, so I would like help to solve this problem.
Steps to reproduce the issue
mvn clean package -Pnative
./target/main-notification-emitter
When running the native image, only the notifier addition log will be displayed, and when running with hotspot, when System.gc() is called, a log is displayed from the emission of the NotificationEmitter event
Describe GraalVM and its environment:
More details
Print execution with hotspot:
Print from execution with native compilation:
All compilation configuration is in the pom file of the project passed in the example
To prove that the problem does not depend on whether it is graalvm ce or oracle graalvm, the print follows with the same behavior:
Currently, as far as I know, there are two main ways to expose information about the garbage collector runtimes while the application is running so that we can view it continuously when we are in Kubernetes, the first is through a javaagent that exports this information to any provider such as Prometheus and another is by adding some lib that sends these metrics to one of these providers, but as far as I've seen both forms depend on notificationEmmiter for updates related to these metrics, without this improvement several applications that monitor the GC via Prometheus + micrometer for example will be left without the data for monitoring related to GC times.
Demo of information visualization in grafana
Without native compilation
With native compilation
@wirthi @kassifar I reopened the issue, if any details were not clear please let me know