oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.32k stars 1.63k forks source link

memory corruption on AArch64-android #1700

Open johanvos opened 5 years ago

johanvos commented 5 years ago

We have a HelloWorld java app, compiled with native-image, linked into a shared library, loaded into an Android activity. That works fine, HelloWorld is printed. More complex Java apps work fine as well.

However, we run into issues with JNI calls that seem to indicate there is some memory corruption when invoking/returning from Java to native. On https://github.com/johanvos/grandroid I have a standalone reproducable setup. The Java source code that is native-imaged into a .o file (on an AArch64 machine) is at https://github.com/johanvos/client-samples/tree/egl/Gradle/HelloWorld

There is a test that is executed in this Java code, that calls 4 native functions. Running this with the hellohwgl.sh script in the grandroid repo on an android phone will cause a crash, due to a sigsegv in libGLESv2_adreno.so . The native implentations of the 4 functions are in the grandroid project, in glass_android.c

That file has a switch: #define ALL_NATIVE 1 If that is set, the call to the second native function will also invoke the third and the fourth, without going back to Java. In this case, there is no crash.

There is no Java functionality between the different native calls. Hence, if the 4 native functions are combined into 2 bigger native functions, it all works fine. If the native functions are called one by one, there is a crash.

Note that the x0 register points to an address in the JNIEnv structure, which is not what you would expect in the functionality being executed (EGL calls, completely away from JNIEnv).

In the native calls, JNIEnv* is passed as the first parameter, and it is at 0x79d0e54000

Here is the crash info:

x0   00000079d0e540c0  x1   00000079c1a37880  x2   0000000000000000  x3   00000079d0e42780
09-25 11:29:08.242  4750  4750 F DEBUG   :     x4   0000000000000000  x5   0000007900000000  x6   0000000000000001  x7   0000000000000001
09-25 11:29:08.242  4750  4750 F DEBUG   :     x8   0000000000000001  x9   0000000000000000  x10  0000000000000000  x11  0000000000000000
09-25 11:29:08.242  4750  4750 F DEBUG   :     x12  0000000000000002  x13  00000079d0e0d500  x14  00000079d0e2f580  x15  00000079ce9ea000
09-25 11:29:08.242  4750  4750 F DEBUG   :     x16  00000079ce7b4ef0  x17  0000007a5c12dadc  x18  00000079ce7c5dc8  x19  00000079d0e42780
09-25 11:29:08.242  4750  4750 F DEBUG   :     x20  00000079d0e540c0  x21  00000079d92f8b80  x22  00000079d0e540c0  x23  00000079d0e540c0
09-25 11:29:08.242  4750  4750 F DEBUG   :     x24  0000000000000000  x25  00000079d0e47230  x26  00000079d0e540c0  x27  0000000000000001
09-25 11:29:08.242  4750  4750 F DEBUG   :     x28  00000079d0e47230  x29  00000079c1f97130  x30  00000079ce5e5ef8
09-25 11:29:08.242  4750  4750 F DEBUG   :     sp   00000079c1f97120  pc   00000079ce5e5f64  pstate 0000000060000000
09-25 11:29:08.252  4750  4750 F DEBUG   : 
09-25 11:29:08.252  4750  4750 F DEBUG   : backtrace:
09-25 11:29:08.253  4750  4750 F DEBUG   :     #00 pc 000000000017bf64  /vendor/lib64/egl/libGLESv2_adreno.so (EglApi::MakeCurrent(void*, void*, void*, void*)+356)
09-25 11:29:08.253  4750  4750 F DEBUG   :     #01 pc 0000000000009a38  /vendor/lib64/egl/libEGL_adreno.so (eglMakeCurrent+56)
09-25 11:29:08.253  4750  4750 F DEBUG   :     #02 pc 000000000000f7f8  /system/lib64/libEGL.so (android::egl_display_t::makeCurrent(android::egl_context_t*, android::egl_context_t*, void*, void*, void*, void*, void*, void*)+244)
09-25 11:29:08.253  4750  4750 F DEBUG   :     #03 pc 0000000000014f60  /system/lib64/libEGL.so (eglMakeCurrent+412)
09-25 11:29:08.253  4750  4750 F DEBUG   :     #04 pc 000000000048dfcc  /data/app/com.gluonhq.helloandroid-LFwhaX9kUcZcq6uBA5seqA==/lib/arm64/libmygraal.so (Java_hello_EGL_eglMakeCurrent+448)
vjovanov commented 5 years ago

@loicottet can we try this with the LLVM backend to see if the problem goes away?

johanvos commented 5 years ago

FYI, here is the native-image command I execute on an AArch64 server to generate hello.helloworld.o

#!/bin/bash
export CP=/home/ubuntu/fork/client-samples/Gradle/HelloWorld/build/classes/java/main
export GRAAL=/home/ubuntu/graal/vm/mxbuild/linux-aarch64/GRAALVM_UNKNOWN/graalvm-unknown-19.3.0-dev
$GRAAL/bin/native-image \
        --no-fallback -H:TempDirectory=/tmp/tmphw \
        -H:+SpawnIsolates \
        -H:+ExitAfterRelocatableImageWrite \
        -H:+UseOnlyWritableBootImageHeap \
        -H:+ReportExceptionStackTraces \
        -H:+AllowIncompleteClasspath \
        -cp $CP \
        hello.HelloWorld
pfustc commented 5 years ago

There is a possibility that this issue is related to my #1704 (also due to SIGSEGV).

johanvos commented 5 years ago

We just tried the same code with the LLVM backend (aarch64isolates branch from @loicottet ) and so far, no crashes.