oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.33k stars 1.63k forks source link

[native-image] StackOverflowError occurs when running a app using a shared library built using native-image #2711

Open rinx opened 4 years ago

rinx commented 4 years ago

Describe the issue

An error like the following occurs when running a Go app that calls a function in a shared library (built using native-image) from a hundred of goroutines through cgo.

Fatal error: StackOverflowError: Enabling the yellow zone of the stack did not make any stack space available. Possible reasons for that: 1) A call from native code to Java code provided the wrong JNI environment or the wrong IsolateThread; 2) Frames of native code filled the stack, and now there is not even enough stack space left to throw a regular StackOverflowError; 3) An internal VM error occurred.

I couldn't find a cause of this problem. Is this a bug of shared libraries built using native-image? Or is there any constraints when using shared libraries built using native-image? If you have any ideas to solve this problem, please help me. Thanks.

Steps to reproduce the issue I pushed a sample code (https://github.com/rinx/graalvm-java-cgo-test) to reproduce this problem.

  1. git clone --depth 1 https://github.com/rinx/graalvm-java-cgo-test.git
  2. cd graalvm-java-cgo-test
  3. make target/call_from_go
  4. export LD_LIBRARY_PATH=target/native
  5. ./target/call_from_go

Describe GraalVM and your environment:

More details native-image command (with --native-image-info and --verbose flags) and its outputs.

native-image \
-cp src \
-H:Name=libjavacgo \
--shared \
-H:+ReportExceptionStackTraces \
-H:Log=registerResource: \
-H:+RemoveSaturatedTypeFlows \
-H:+PrintClassInitialization \
-H:+TraceClassInitialization \
--verbose \
--no-fallback \
--no-server \
--initialize-at-build-time \
--native-image-info --verbose \
-J-Xms2g \
-J-Xmx7g
Executing [
/usr/lib/graalvm/bin/java \
-XX:+UseParallelGC \
-XX:+UnlockExperimentalVMOptions \
-XX:+EnableJVMCI \
-Dtruffle.TrustAllTruffleRuntimeProviders=true \
-Dtruffle.TruffleRuntime=com.oracle.truffle.api.impl.DefaultTruffleRuntime \
-Dgraalvm.ForcePolyglotInvalid=true \
-Dgraalvm.locatorDisabled=true \
-Dsubstratevm.IgnoreGraalVersionCheck=true \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.aarch64=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.amd64=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.code.site=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.code.stack=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.code=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.common=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.hotspot.aarch64=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.hotspot.amd64=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.hotspot.sparc=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.hotspot=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.meta=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.runtime=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.services=ALL-UNNAMED \
--add-exports=jdk.internal.vm.ci/jdk.vm.ci.sparc=ALL-UNNAMED \
--add-exports=org.graalvm.truffle/com.oracle.truffle.api=ALL-UNNAMED \
--add-opens=jdk.internal.vm.compiler/org.graalvm.compiler.debug=ALL-UNNAMED \
--add-opens=jdk.internal.vm.compiler/org.graalvm.compiler.nodes=ALL-UNNAMED \
--add-opens=jdk.unsupported/sun.reflect=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.module=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.reflect=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED \
--add-opens=java.base/java.lang.ref=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.nio=ALL-UNNAMED \
--add-opens=java.base/java.nio.file=ALL-UNNAMED \
--add-opens=java.base/java.security=ALL-UNNAMED \
--add-opens=java.base/javax.crypto=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED \
--add-opens=java.base/sun.security.x509=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.logger=ALL-UNNAMED \
--add-opens=org.graalvm.sdk/org.graalvm.nativeimage.impl=ALL-UNNAMED \
--add-opens=org.graalvm.sdk/org.graalvm.polyglot=ALL-UNNAMED \
--add-opens=org.graalvm.truffle/com.oracle.truffle.polyglot=ALL-UNNAMED \
--add-opens=org.graalvm.truffle/com.oracle.truffle.api.impl=ALL-UNNAMED \
-XX:+UseJVMCINativeLibrary \
-Xss10m \
-Xms1g \
-Xmx10055152432 \
-Duser.country=US \
-Duser.language=en \
-Djava.awt.headless=true \
-Dorg.graalvm.version=20.3.0-dev \
-Dorg.graalvm.config= \
-Dcom.oracle.graalvm.isaot=true \
-Djava.system.class.loader=com.oracle.svm.hosted.NativeImageSystemClassLoader \
-Xshare:off \
--module-path \
/usr/lib/graalvm/lib/truffle/truffle-api.jar \
-javaagent:/usr/lib/graalvm/lib/svm/builder/svm.jar=traceInitialization \
-Djdk.internal.lambda.disableEagerInitialization=true \
-Djdk.internal.lambda.eagerlyInitialize=false \
-Djava.lang.invoke.InnerClassLambdaMetafactory.initializeLambdas=false \
-Xms2g \
-Xmx7g \
-cp \
/usr/lib/graalvm/lib/svm/builder/pointsto.jar:/usr/lib/graalvm/lib/svm/builder/llvm-wrapper-shadowed.jar:/usr/lib/graalvm/lib/svm/builder/svm-llvm.jar:/usr/lib/graalvm/lib/svm/builder/javacpp-shadowed.jar:/usr/lib/graalvm/lib/svm/builder/objectfile.jar:/usr/lib/graalvm/lib/svm/builder/svm.jar:/usr/lib/graalvm/lib/svm/builder/llvm-platform-specific-shadowed.jar \
'com.oracle.svm.hosted.NativeImageGeneratorRunner$JDK9Plus' \
-watchpid \
84170 \
-imagecp \
/usr/lib/graalvm/lib/svm/library-support.jar:/root/local/src/github.com/rinx/graalvm-java-cgo-test/src \
-H:Path=/root/local/src/github.com/rinx/graalvm-java-cgo-test \
-H:Name=libjavacgo \
-H:+SharedLibrary \
-H:+ReportExceptionStackTraces \
-H:Log=registerResource: \
-H:+RemoveSaturatedTypeFlows \
-H:+PrintClassInitialization \
-H:+TraceClassInitialization \
-H:FallbackThreshold=0 \
-H:ClassInitialization=:build_time \
-H:+DumpTargetInfo \
-H:CLibraryPath=/usr/lib/graalvm/lib/svm/clibraries/linux-amd64 \

]
[libjavacgo:84253]    classlist:   4,026.39 ms,  1.92 GB
[libjavacgo:84253]        (cap):     972.51 ms,  1.92 GB
[libjavacgo:84253]        setup:   3,701.48 ms,  1.92 GB
# Building image for target platform: org.graalvm.nativeimage.Platform$LINUX_AMD64
# Using native toolchain:
#   Name: GNU project C and C++ compiler (gcc)
#   Vendor: linux
#   Version: 9.3.0
#   Target architecture: x86_64
#   Path: /usr/bin/gcc
# Using CLibrary: com.oracle.svm.core.posix.linux.libc.GLibC
Printing initializer configuration to /root/local/src/github.com/rinx/graalvm-java-cgo-test/reports/initializer_configuration_20200726_202500.txt
Printing initializer dependencies to /root/local/src/github.com/rinx/graalvm-java-cgo-test/reports/initializer_dependencies_20200726_202520.dot
Printing 0 classes that are considered as safe for build-time initialization to /root/local/src/github.com/rinx/graalvm-java-cgo-test/reports/safe_classes_20200726_202520.txt
Printing 2540 classes of type BUILD_TIME to /root/local/src/github.com/rinx/graalvm-java-cgo-test/reports/build_time_classes_20200726_202520.txt
Printing 58 classes of type RERUN to /root/local/src/github.com/rinx/graalvm-java-cgo-test/reports/rerun_classes_20200726_202520.txt
Printing 0 classes of type RUN_TIME to /root/local/src/github.com/rinx/graalvm-java-cgo-test/reports/run_time_classes_20200726_202520.txt
[libjavacgo:84253]     (clinit):     256.37 ms,  3.60 GB
# Static libraries:
#   ../../../../../../usr/lib/graalvm/lib/svm/clibraries/linux-amd64/liblibchelper.a
#   ../../../../../../usr/lib/graalvm/lib/static/linux-amd64/glibc/libnet.a
#   ../../../../../../usr/lib/graalvm/lib/svm/clibraries/linux-amd64/libffi.a
#   ../../../../../../usr/lib/graalvm/lib/static/linux-amd64/glibc/libnio.a
#   ../../../../../../usr/lib/graalvm/lib/static/linux-amd64/glibc/libjava.a
#   ../../../../../../usr/lib/graalvm/lib/static/linux-amd64/glibc/libfdlibm.a
#   ../../../../../../usr/lib/graalvm/lib/static/linux-amd64/glibc/libzip.a
#   ../../../../../../usr/lib/graalvm/lib/svm/clibraries/linux-amd64/libjvm.a
# Other libraries: pthread,dl,z,rt
[libjavacgo:84253]   (typeflow):   9,319.47 ms,  3.60 GB
[libjavacgo:84253]    (objects):  10,472.42 ms,  3.60 GB
[libjavacgo:84253]   (features):     263.32 ms,  3.60 GB
[libjavacgo:84253]     analysis:  20,628.56 ms,  3.60 GB
[libjavacgo:84253]     universe:     456.88 ms,  3.60 GB
[libjavacgo:84253]      (parse):   3,243.48 ms,  3.60 GB
[libjavacgo:84253]     (inline):   2,279.76 ms,  3.60 GB
[libjavacgo:84253]    (compile):  16,813.69 ms,  3.56 GB
[libjavacgo:84253]      compile:  22,873.66 ms,  3.56 GB
[libjavacgo:84253]        image:   1,171.86 ms,  3.56 GB
[libjavacgo:84253]        write:     477.39 ms,  3.56 GB
[libjavacgo:84253]      [total]:  53,913.22 ms,  3.56 GB

Error message when crash the binary ./target/call_from_go.

Fatal error: StackOverflowError: Enabling the yellow zone of the stack did not make any stack space available. Possible reasons for that: 1) A call from native code to Java code provided the wrong JNI environment or the wrong IsolateThread; 2) Frames of native code filled the stack, and now there is not even enough stack space left to throw a regular StackOverflowError; 3) An internal VM error occurred.

JavaFrameAnchor dump:

  No anchors

TopFrame info:

  TotalFrameSize in CodeInfoTable 32

VMThreads info:

  VMThread 00007f086c000b60  STATUS_IN_NATIVE  java.lang.Thread@0x7f0874501028
  VMThread 00007f0868000b60  STATUS_IN_NATIVE  java.lang.Thread@0x7f0875301028
  VMThread 00007f0870000b60  STATUS_IN_NATIVE  java.lang.Thread@0x7f0876e01028
  VMThread 00007f0880000c90  STATUS_IN_NATIVE  java.lang.Thread@0x7f0877801028
  VMThread 00007f0878000b60  STATUS_IN_JAVA (safepoints disabled)  java.lang.Thread@0x7f0877a01028
  VMThread 00007f0884000b60  STATUS_IN_NATIVE  java.lang.Thread@0x7f088c001028
  VMThread 0000000001c768b0  STATUS_IN_NATIVE  java.lang.Thread@0x7f0877dfb500

VM Thread State for current thread 00007f0878000b60:

  0 (8 bytes): com.oracle.svm.jni.JNIThreadLocalEnvironment.jniFunctions = (bytes) 
    00007f0878000b60: 00007f0877d40010

  8 (32 bytes): com.oracle.svm.core.genscavenge.ThreadLocalAllocation.regularTLAB = (bytes) 
    00007f0878000b68: 00007f0874f00000 00007f0875000000
    00007f0878000b78: 00007f0874fa4148 0000000000000000

  40 (8 bytes): com.oracle.svm.core.heap.NoAllocationVerifier.openVerifiers = (Object) null
  48 (8 bytes): com.oracle.svm.core.jdk.IdentityHashCodeSupport.hashCodeGeneratorTL = (Object) null
  56 (8 bytes): com.oracle.svm.core.snippets.ExceptionUnwind.currentException = (Object) null
  64 (8 bytes): com.oracle.svm.core.thread.JavaThreads.currentThread = (Object) java.lang.Thread  00007f0877a01028
  72 (8 bytes): com.oracle.svm.core.thread.ThreadingSupportImpl.activeTimer = (Object) null
Fatal error: Must either be at a safepoint or in native mode

  Error: printDiagnostics already in progress.

  Fatal error: Must either be at a safepoint or in native mode
galderz commented 4 years ago

@rinx I've just had this same fatal error. I've fixed by changing from:

@CFunction(value = "foo", transition = CFunction.Transition.NO_TRANSITION)

to:

@CFunction(value = "foo", transition = CFunction.Transition.TO_NATIVE)

Basically, any reentrant calls back to Java need to be signalled, otherwise the code thinks you're calling it from the wrong place (java instead of native).

TigranOhanyan commented 1 year ago

I've encountered the same issue described here. Has there been any progress or known workarounds? It's causing some challenges on my end, and any guidance would be greatly appreciated.

ennerf commented 9 months ago

I encountered the same issue, and it was due to an IsolateThread being used from a different native thread, i.e., not the one that initialized it.