oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.26k stars 1.62k forks source link

[GR-58225] Internal Error guarantee failed: "can not load classes with compiler thread" when using NFI Panama #9692

Open nirvdrum opened 4 days ago

nirvdrum commented 4 days ago

Describe GraalVM and your environment :

Have you verified this issue still happens when using the latest snapshot? Yes. I only see the issue with snapshots since Puma support isn't yet available in a release.

Describe the issue

While running a Rails application with an GFTC EA snapshot of TruffleRuby with the Panama backend enabled I sometimes see the JVM crash. Unfortunately, since this happens during application boot, I don't have the hs_err log. I'm trying to work with our infrastructure team to preserve this. Currently, when it crashes the deployment is halted and the container used is immediately discarded. I do, however, have a copy of the core dump, but that is too large to attach to the issue.

Code snippet or code repository that reproduces the issue

Puma starting in single mode...
* Puma version: 6.4.2 (truffleruby 24.2.0-dev-37fdafbf - ruby 3.2.4) ("The Eagle of Durango")
*  Min threads: 120
*  Max threads: 120
*  Environment: staging
*          PID: 91
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (systemDictionary.cpp:631), pid=91, tid=111
#  guarantee(THREAD->can_call_java()) failed: can not load classes with compiler thread: class=com/oracle/truffle/nfi/backend/panama/NativePointer, classloader=jdk/internal/loader/ClassLoaders$AppClassLoader
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 24-dev+13.1 (24.0+13) (build 24+13-jvmci-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 24-dev+13.1 (24+13-jvmci-b01, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xe90c86][thread 110 also had an error]
SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, JavaThread*)+0x776
#
# Core dump will be written. Default location: Core dumps may be processed with "/var/lib/toolbox/crash-reporter/crash-reporter-binary %p %P %s %E" (or dumping to /app/core.91)
#
# An error report file with more information is saved as:
# /app/hs_err_pid91.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp

Steps to reproduce the issue Please include both build steps as well as run steps

  1. Install the latest TruffleRuby GFTC EA build (e.g., with rbenv + ruby-build it would be rbenv install truffleruby+graalvm-dev
  2. Enable the Panama backend export TRUFFLERUBYOPT="--experimental-options --ruby.cexts-panama"
  3. Boot an application using native extensions

Unfortunately, the crash doesn't occur reliably. Sometimes I get an exception instead, which I think is related to a known issue with the propagation of errno

Expected behavior I'd expect the application to behave functionally the same both with and without Panama enabled.

eregon commented 3 days ago

Filed internally as GR-58225.

eregon commented 3 days ago

@nirvdrum After a quick look it would be really helpful or even necessary to get the hs_err, could you try to get it? @dougxc told me -XX:LogFile (--vm.XX:LogFile=path as a truffleruby arg) can be used to put the hs_err log anywhere.

nirvdrum commented 3 days ago

I'm trying to get the hs_err log, but I'm still running into the limitation I mentioned in the issue description. Unfortunately, it doesn't really matter where I write the file as I don't have the means to mount a volume so the file goes away when the container is discarded. We have a crash reporting service based on core_pattern that scans for hs_err and uploads them to a bucket, but it hasn't picked up these files. I'm trying to debug that, but it's a very slow process. I haven't yet been able to reproduce locally.

dougxc commented 3 days ago

I'm not sure if it works, but maybe you could try -XX:LogFile=/dev/stdout.