oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.09k stars 1.6k forks source link

Native Image and Jdk 21 Virtual Threads need a command like "jcmd Thread.dump_to_file" to investigate deadlocks #9152

Open Moscagus opened 1 month ago

Moscagus commented 1 month ago

Describe the issue

"--enable-monitoring=threaddump" is not enough to investigate deadlocks at the jdk21 virtual threads level.

It is necessary to have a thread dump like the one generated by the "jcmd Thread.dump_to_file" command: https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html#GUID-265B55B7-F330-46E2-BCF0-9FFD8A5A15B3

fniephaus commented 1 month ago

Thanks for the feature request! @roberttoyonaga is working on support for jcmd as part of https://github.com/oracle/graal/issues/8915. I assume this will include making jcmd Thread.dump_to_file work with Native Image. Am I assuming correctly, Robert? 🙂

roberttoyonaga commented 1 month ago

My plan is to implement the Attach-API as well as a few useful DCMDs that can be issued through JCMD. I'll add Thread.dump_to_file to the list of DCMDs I plan on initially supporting.

In the meantime, I think you can accomplish your goal with 2 other approaches:

  1. Using the JFR event jdk.JavaMonitorEnter ("Monitor Blocked"). Native Image JFR supports virtual threads and this JFR event will show you data on threads that are holding monitors. This should help expose deadlocks.

  2. You can also use the "thread dump" feature to dump all thread stacks on signal. I think this will deliver similar behaviour to what you desire with Thread.dump_to_file. Basically just build with --enable-monitoring=threaddump, then at runtime, send SIGQUIT to your app (kill -SIGQUIT <pid>). Use SIGBREAK on windows. This will dump all thread stacks to standard output. Unfortunately, I think this feature is undocumented.

With approach 2 you should get something like this:

"Signal Dispatcher" #59 daemon thread=0x00007f0928000b80 state=RUNNABLE

  i  SP 0x00007f0935ffddb0 IP 0x00000000004e460d size=128   com.oracle.svm.core.posix.headers.CSunMiscSignal.await(CSunMiscSignal.java)
  i  SP 0x00007f0935ffddb0 IP 0x00000000004e460d size=128   com.oracle.svm.core.posix.Util_jdk_internal_misc_Signal$SignalState.await(SunMiscSubstitutions.java:379)
  A  SP 0x00007f0935ffddb0 IP 0x00000000004e460d size=128   com.oracle.svm.core.posix.Util_jdk_internal_misc_Signal$DispatchThread.run(SunMiscSubstitutions.java:315)
  i  SP 0x00007f0935ffde30 IP 0x00000000005a4210 size=32    java.lang.Thread.runWith(Thread.java:1588)
  A  SP 0x00007f0935ffde30 IP 0x00000000005a4210 size=32    java.lang.Thread.run(Thread.java:1575)
  A  SP 0x00007f0935ffde50 IP 0x000000000050b083 size=48    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:834)
  A  SP 0x00007f0935ffde80 IP 0x000000000050af0d size=32    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:810)
  A  SP 0x00007f0935ffdea0 IP 0x0000000000463620 size=96    com.oracle.svm.core.code.IsolateEnterStub.PlatformThreads_threadStartRoutine_Z5jZ9wXZGDAvr0CL8KrTOA(IsolateEnterStub.java:0)

"Reference Handler" #58 daemon thread=0x00007f0930000b80 state=WAITING

  A  SP 0x00007f09367fed60 IP 0x000000000048bed6 size=64    com.oracle.svm.core.genscavenge.HeapImpl.transitionToNativeThenAwaitPendingRefs(HeapImpl.java:588)
  A  SP 0x00007f09367feda0 IP 0x000000000048bfc5 size=48    com.oracle.svm.core.genscavenge.HeapImpl.waitForPendingReferenceList(HeapImpl.java:577)
  A  SP 0x00007f09367fedd0 IP 0x000000000048c0f8 size=48    com.oracle.svm.core.genscavenge.HeapImpl.waitForReferencePendingList(HeapImpl.java:567)
  A  SP 0x00007f09367fee00 IP 0x00000000004aba55 size=16    com.oracle.svm.core.heap.ReferenceInternals.waitForPendingReferences(ReferenceInternals.java:176)
  A  SP 0x00007f09367fee10 IP 0x00000000004ab567 size=32    com.oracle.svm.core.heap.ReferenceHandlerThread.run(ReferenceHandlerThread.java:84)
  i  SP 0x00007f09367fee30 IP 0x00000000005a4210 size=32    java.lang.Thread.runWith(Thread.java:1588)
  A  SP 0x00007f09367fee30 IP 0x00000000005a4210 size=32    java.lang.Thread.run(Thread.java:1575)
  A  SP 0x00007f09367fee50 IP 0x000000000050b083 size=48    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:834)
  A  SP 0x00007f09367fee80 IP 0x000000000050af0d size=32    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:810)
  A  SP 0x00007f09367feea0 IP 0x0000000000463620 size=96    com.oracle.svm.core.code.IsolateEnterStub.PlatformThreads_threadStartRoutine_Z5jZ9wXZGDAvr0CL8KrTOA(IsolateEnterStub.java:0)

...
Moscagus commented 1 month ago

@roberttoyonaga, thanks for including "jcmd Thread.dump_to_file" to the list of DCMDs.

Regarding workarounds, it is clear that the way to detect deadlocks is through "threads dump". I already tried approach 2 on native images, achieving something similar to what you are showing me.

The problem is that this solution generates something similar to what a "jstack" generates. That is, it only reports the platform threads, leaving aside the stack of virtual threads. Because of this reason, Oracle added the virtual threads stack to the "jcmd Thread.dump_to_file" command. Making this command essential for detecting deadlocks at the Java virtual thread level.

Oracle documentation: "The jcmd thread dump lists virtual threads that are blocked in network I/O operations and virtual threads that are created by the ExecutorService interface. "

roberttoyonaga commented 1 month ago

@Moscagus I've opened a PR that adds JCMD support here: https://github.com/oracle/graal/pull/9232 It also adds support for the Thread.dump_to_file command. Feel free to try it out and let me know if this solution works for you.

Moscagus commented 4 weeks ago

@roberttoyonaga Please, could you tell me how to obtain the naitive-image binary so I can test ?

roberttoyonaga commented 4 weeks ago

Please, could you tell me how to obtain the naitive-image binary so I can test ?

@Moscagus you need to first build GraalVM usingmx. You also need the latest labsjdk release.

Put mx on the path and set java home to labsjdk: export PATH=/path/to/mx:$PATH export JAVA_HOME=/path/to/labsjdk Navigate to the GraalVM graal/substratevm directory. mx build Then you can use mx native-image, or just find the native-image binary in the build directory.

The first few minutes of this video does a good job of explaining this as well: https://youtu.be/3Gh0cz3vjG8?feature=shared&t=202