yglukhov / jnim

Nim - Java bridge
MIT License
184 stars 13 forks source link

Random SIGSEGV in minimal example #23

Closed bluenote10 closed 7 years ago

bluenote10 commented 7 years ago

I'm experimenting with extending some Scala code with Nim, and thanks to this project this is looking very promising! (small demo project, also has a debug branch to reproduce this issue)

The problem is that I'm getting random segfaults. My Nim code of a minimal example looks like this:

import jnim

proc Java_scalanim_NativeWrapperNim_00024_addOne*(env: JNIEnvPtr, obj: jobject, value: jint): jint {. cdecl, exportc, dynlib .} =
  echo "Printing from JNI..."
  result = value + 1

which I'm compiling with nim --app:lib -o:libnativenim.so c native.nim. The main function of the Scala code consists of one line only (the System.loadLibrary call is done via macro):

val resNim = NativeWrapperNim.addOne(1)

This does work nicely in general, but crashes with a ~10% probability with a segfault.

The first question: Do you see if I'm making an obvious mistake or have you experienced something similar?

If not I will continue debugging... A quick gdb session only got me this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fc2700 (LWP 14740)]
0x00007fffe10002b4 in ?? ()
(gdb) where
#0  0x00007fffe10002b4 in ?? ()
#1  0x0000000000000246 in ?? ()
#2  0x00007fffe1000160 in ?? ()
#3  0x00007ffff7392bf0 in VM_Operation::_names () from /home/fabian/bin/jdk1.8.0_74/jre/lib/amd64/server/libjvm.so
#4  0x00007ffff7fc1980 in ?? ()
#5  0x00007ffff6ec5d2d in VM_Version::get_processor_features() () from /home/fabian/bin/jdk1.8.0_74/jre/lib/amd64/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) list
1   events.c: No such file or directory.

Difficult to say if this is a JVM bug (had no issues in general with my JVM before) or whether the stack corruption happens either in the Nim core or rather on jnim side.

Update: So far I couldn't reproduce the problem with the equivalent C library code, so it might be related to Nim.

Nim version: 0.15.2 Java version: Java(TM) SE Runtime Environment (build 1.8.0_74-b02)

yglukhov commented 7 years ago

@bluenote10, actually I can't see you using jnim in your samples, jnim is more about high-level java objects wrapper, and your sample is just using low-level jni stuff. Now to the problem. It's hard to tell from the stack trace what is going on, but I would bet that you're not initializing Nim tls (gc, stack pointers, frame info, etc) and that can cause pretty nasty behavior. To identify that, try to make your exportc proc not dependant on nim tls and see if that works:

{.push stackTrace: off.}
proc Java_scalanim_NativeWrapperNim_00024_addOne*(env: JNIEnvPtr, obj: jobject, value: jint): jint {. cdecl, exportc, dynlib .} =
  result = value + 1
{.pop.}
bluenote10 commented 7 years ago

@yglukhov You're right, so far I mainly need the low level stuff -- but still great to have nice FFI bindings.

I was trying it again using the stackTrace: offpragma: It still crashes and I get similar stack traces.

And yes, this is most likely not a jnim issue, so I think I'll better ask in the Nim forum for general help.

yglukhov commented 7 years ago

@bluenote10, can you have a look and post here the generated C function for Java_scalanim_NativeWrapperNim_00024_addOne? Also --passC:-g sometimes helps to get a better stack trace so you could try that.

bluenote10 commented 7 years ago

The C code looks like that:

N_LIB_EXPORT N_CDECL(void, Java_NativeTest_run)(struct JNINativeInterface_** env0, void* obj0) {
    nimfr("Java_NativeTest_run", "native.nim")
    nimln(4, "native.nim");
    setupforeignthreadgc_81402_1689653243();
    nimln(5, "native.nim");
    printf("%s\012", ((NimStringDesc*) &T3875853531_2)? (((NimStringDesc*) &T3875853531_2))->data:"nil");
    fflush(stdout);
    popFrame();
}

Note that I have simplified the example a little bit: Just a void NativeTest.run() using plain Java for now, so that it can be reproduced by just build.sh && run.sh.

According to your suggestion, I was experimenting with system.setupForeignThreadGc() now, but it still segfaults.

yglukhov commented 7 years ago

Well your sample works perfectly fine for me on macos. 1000 iterations went ok.

bluenote10 commented 7 years ago

After some more experimentation, we can probably close this. It looks like two things are required to avoid the segfaults:

The hotspot JVM uses SIGSEGV as a means of internal communication, which conflicts with Nim's signal handlers.