oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.29k stars 1.63k forks source link

Exceptions and NPEs when taking JFR Snapshots for CE 17 native images #9717

Open MattAlp opened 3 weeks ago

MattAlp commented 3 weeks ago

Describe the issue When running a sample app instrumented with the DataDog APM / Profiling agent, NPEs and exceptions are seen in JFR snapshot related code.

Steps to reproduce the issue Please include both build steps as well as run steps

  1. Please clone https://github.com/MattAlp/datadog_poc_479 and execute ./gradlew clean nativeCompile using the java provided by sdk use java 23.0.5.r17-nik
  2. Run the binary at ./build/native/nativeCompile/demo. After roughly 1-2 minutes, warnings and stack traces should be seen (as attached below).

Describe GraalVM and your environment:

More details Warning seen:

[warn][jfr,system] Exception occurred during execution of period hook for jdk.ContainerConfiguration(54857)

Stack trace seen:

[dd.trace 2024-09-10 10:44:21:803 -0300] [dd-profiler-recording-scheduler] ERROR com.datadog.profiling.controller.ProfilingSystem - Exception in profiling thread, continuing
java.lang.NullPointerException
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrTypeRepository.getClassLoaderId(JfrTypeRepository.java:319)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrTypeRepository.writeClass(JfrTypeRepository.java:148)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrTypeRepository.writeClasses(JfrTypeRepository.java:141)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrTypeRepository.write(JfrTypeRepository.java:69)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrChunkWriter.writeConstantPools(JfrChunkWriter.java:354)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrChunkWriter.writeCheckpointEvent(JfrChunkWriter.java:324)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrChunkWriter.writeFlushCheckpoint(JfrChunkWriter.java:296)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.JfrChunkWriter.closeFile(JfrChunkWriter.java:222)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.jfr.SubstrateJVM.setOutput(SubstrateJVM.java:362)
        at jdk.jfr@17.0.12/jdk.jfr.internal.JVM.setOutput(JVM.java:252)
        at jdk.jfr@17.0.12/jdk.jfr.internal.MetadataRepository.setOutput(MetadataRepository.java:272)
        at jdk.jfr@17.0.12/jdk.jfr.internal.PlatformRecorder.rotateDisk(PlatformRecorder.java:407)
        at jdk.jfr@17.0.12/jdk.jfr.internal.PlatformRecorder.fillWithRecordedData(PlatformRecorder.java:607)
        at jdk.jfr@17.0.12/jdk.jfr.FlightRecorder.takeSnapshot(FlightRecorder.java:113)
        at com.datadog.profiling.controller.openjdk.OpenJdkOngoingRecording.snapshot(OpenJdkOngoingRecording.java:155)
        at com.datadog.profiling.controller.ProfilingSystem$SnapshotRecording.snapshot(ProfilingSystem.java:254)
        at com.datadog.profiling.controller.ProfilingSystem$SnapshotRecording.snapshot(ProfilingSystem.java:246)
        at datadog.trace.util.AgentTaskScheduler$PeriodicTask.run(AgentTaskScheduler.java:311)
        at datadog.trace.util.AgentTaskScheduler$Worker.run(AgentTaskScheduler.java:266)
        at java.base@17.0.12/java.lang.Thread.run(Thread.java:840)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:915)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:890)
[warn ][jfr,system] Exception occurred during execution of period hook for jdk.ContainerConfiguration(130264)

I have not been successful in consistently reproducing the NPE, unlike the warning. As it is in JfrTypeRepository.getClassLoaderId, it may require a specific class / classloader to trigger.

selhagani commented 2 weeks ago

Hi @MattAlp,

Thank you for reaching out to us.

It looks like you're using VM Liberica-NIK-23.0.5-1 (build 17.0.12+10-LTS, mixed mode, sharing), which is based on GraalVM CE 17, an outdated version. There's a new release of GraalVM available here. Could you please test with the latest version and let us know how it goes?

MattAlp commented 2 weeks ago

Hey @selhagani, I'll have the client experiment with this- they may be constrained to JDK 17 for legacy reasons, will update shortly.