oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.34k stars 1.63k forks source link

Graal VM crashes while executing embedded Python script (SEGV) #3160

Open fra-orolo opened 3 years ago

fra-orolo commented 3 years ago

Describe the issue Polyglot JVM crashes after fully executing embedded Python script inside a computation JUnit test. The Python code works against a Java API that is added as functions to the Python language bindings. After the script is complete, our code is removing the shared objects from the bindings with Value.removeMember(functionName). There the code crashes.

Steps to reproduce the issue Difficult to reproduce, even adding print statements to the script make it sometimes work

Describe GraalVM and your environment:

* Start calcNormalizedValues
* calculatePlateLoop
* End calcNormalizedValues
* End script
2021-01-27 16:02:03 CET [DEBUG] Cleaning getExperimentAnnotationValue
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f2fa439857b, pid=13964, tid=13972
#
# JRE version: OpenJDK Runtime Environment GraalVM CE 21.0.0 (11.0.10+8) (build 11.0.10+8-jvmci-21.0-b06)
# Java VM: OpenJDK 64-Bit Server VM GraalVM CE 21.0.0 (11.0.10+8-jvmci-21.0-b06, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 15811 jvmci org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.profiledPERoot([Ljava/lang/Object;)Ljava/lang/Object; jdk.internal.vm.compiler (48 bytes) @ 0x00007f2fa439857b [0x00007f2fa43984e0+0x000000000000009b] (org.graalvm.polyglot.Value<MergedScopes>.removeMember#2)
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /local0/mginkel/source/master/scr_source/scr_core/hs_err_pid13964.log
Compiled method (c1)   64609 15463       3       org.graalvm.compiler.truffle.runtime.OptimizedCallTarget::isValidArgumentProfile (53 bytes)
 total in heap  [0x00007f2f9d703490,0x00007f2f9d703ac0] = 1584
 relocation     [0x00007f2f9d7035f8,0x00007f2f9d703658] = 96
 main code      [0x00007f2f9d703660,0x00007f2f9d703960] = 768
 stub code      [0x00007f2f9d703960,0x00007f2f9d7039b0] = 80
 oops           [0x00007f2f9d7039b0,0x00007f2f9d7039b8] = 8
 metadata       [0x00007f2f9d7039b8,0x00007f2f9d7039c0] = 8
 scopes data    [0x00007f2f9d7039c0,0x00007f2f9d7039f8] = 56
 scopes pcs     [0x00007f2f9d7039f8,0x00007f2f9d703a98] = 160
 dependencies   [0x00007f2f9d703a98,0x00007f2f9d703aa0] = 8
 nul chk table  [0x00007f2f9d703aa0,0x00007f2f9d703ac0] = 32
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled

hs_err_pid13964.log

munishchouhan commented 3 years ago

@fra-orolo thanks for reproducing the issue we will take a look and update you

fra-orolo commented 3 years ago

Sadly I can't reproduce it with a simple, sharable unit test, I have a unit test running exactly the same python code on a small mock-dataset, but that works flawlessly. The reproducible crash was observed with a bigger integration test, pulling a large dataset from an internal database.

munishchouhan commented 3 years ago

sorry I meant reporting the issue, We will try to find the problem with hs_err log and let you know the outcome

timfel commented 3 years ago

There is no interesting information in the log. Is there any chance you can share the script (even if it's not a deterministic reproducer). Are you loading any Python packages that are C extensions (like struct or numpy)?

In any case @mcraj017, this looks more like a compiler bug to me, so probably something for the compiler team to look at.

fra-orolo commented 3 years ago

There is no interesting information in the log. Is there any chance you can share the script (even if it's not a deterministic reproducer). Are you loading any Python packages that are C extensions (like struct or numpy)?

Not at all, the executed script is just pure Python 2.7 that works nicely on Jython and also works with Graal-Python, if it is not processing a lot of data. But If i benchmark with a 10^5 numbers to normalize, it starts crashing.

The point where it crashes is the removal of our Python -> Java API bindings from the shared namespace. Interestingly this works without crashing, when I fill a Python dict with the namespace bindings and pass it to builtins.exec(code, globals, locals) . (This way I avoid the Graal API for namespace sharing).

timfel commented 3 years ago

@chumer from the log and the description it seems to like the crash is trying to call a compiled removeMember on the polyglot bindings object. Any ideas?