kud1ing / rucaja

Calling the JVM from Rust via JNI
https://docs.rs/rucaja
Apache License 2.0
31 stars 7 forks source link

NullPointerException in tests/java_array #24

Closed fpoli closed 6 years ago

fpoli commented 6 years ago

I open this issue to track the NullPointerException in tests/java_array.rs (introduced in #22).

Starting the JVM with -Xcheck:jni gives a (slightly) better error message:

FATAL ERROR in native method: Bad global or local ref passed to JNI

Disabling -Xcheck:jni and running the test with a debug JVM the message becomes more interesting:

# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/jniHandles.hpp:178
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/fpoli/hg/jdk8u/hotspot/src/share/vm/runtime/jniHandles.hpp:178), pid=15159, tid=0x00007fd6ef1ed840
#  assert(result != badJNIHandle) failed: Pointing to zapped jni handle area
#
# JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal-fastdebug-fpoli_2017_11_21_14_33-b00)
# Java VM: OpenJDK 64-Bit Server VM (25.71-b00-fastdebug mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /home/fpoli/src/rust-calls-jvm/core or core.15159
#
# An error report file with more information is saved as:
# /home/fpoli/src/rust-calls-jvm/hs_err_pid15159.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

Looking at hotspot/src/share/vm/runtime/jniHandles.hpp:178 the content is this:

inline oop JNIHandles::resolve(jobject handle) {
  oop result = (handle == NULL ? (oop)NULL : *(oop*)handle);
  assert(result != NULL || (handle == NULL || !CheckJNICalls || is_weak_global_handle(handle)), "Invalid value read from jni handle");
  assert(result != badJNIHandle, "Pointing to zapped jni handle area"); // <-- line 178
  return result;
};
kud1ing commented 6 years ago

I think the bug is in https://github.com/kud1ing/rucaja/blob/master/src/jvm_method.rs If we hold a reference to a JVM object, we need to call NewGlobalRef() as in https://github.com/kud1ing/rucaja/blob/master/src/macros_jvm_wrappers.rs This seems to be missing here.

kud1ing commented 6 years ago

If only i could remember why JvmMethod is not defined using jvm_wrapper!().

fpoli commented 6 years ago

This SO question has a lot of useful information. They say jmethodID doesn't need a global reference, so I think this is the reason we don't need the jvm_wrapper!().

Similarly, from Android JNI Tips:

The class references, field IDs, and method IDs are guaranteed valid until the class is unloaded. Classes are only unloaded if all classes associated with a ClassLoader can be garbage collected, which is rare but will not be impossible in Android. Note however that the jclass is a class reference and must be protected with a call to NewGlobalRef (see the next section).

I suspect that a jclass can't survive a DetachCurrentThread (when the jvm_attachment goes out of scope) before being turned into a global reference, but I have found no reference for this. A hint is that, compared to other C++ JNI examples, in Rucaja we do a lot of unnecessary attach/detach.

fpoli commented 6 years ago

Stack of the fatal error mentioned above:

--------------- T H R E A D ---------------

Current thread (0x00007f50a1e1d800): JavaThread "Thread-2" [_thread_in_vm, id=15784, stack(0x00007f50a21ff000,0x00007f50a2400000)]

Stack: [0x00007f50a21ff000,0x00007f50a2400000],  sp=0x00007f50a23fe110,  free space=2044k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x10e592b]  VMError::report_and_die()+0x15b
V  [libjvm.so+0x773b70]  report_vm_error(char const*, int, char const*, char const*)+0x90
V  [libjvm.so+0xac90d6]  JNIHandles::resolve(_jobject*)+0x106
V  [libjvm.so+0xa91cba]  jni_NewGlobalRef+0x15a
C  [java_array-e9efbd105599ce18+0x18f6c]  rucaja::jvm_object::JvmObject::from_jvm_ptr::h45e6f993a0a13313+0xfc
C  [java_array-e9efbd105599ce18+0x115a8]  java_array::test_java_arrays::h7dba00310c03d017+0x218
C  [java_array-e9efbd105599ce18+0x28d42]  _$LT$F$u20$as$u20$test..FnBox$LT$T$GT$$GT$::call_box::h3cf6e61f5182f464+0x12
C  [java_array-e9efbd105599ce18+0x64bbd]  __rust_maybe_catch_panic+0x1d
C  [java_array-e9efbd105599ce18+0x1a21b]  std::sys_common::backtrace::__rust_begin_short_backtrace::h36d0e6f1b34daaf1+0x1bb
C  [java_array-e9efbd105599ce18+0x1af53]  std::panicking::try::do_call::h0a36fa793f806dec+0x53
C  [java_array-e9efbd105599ce18+0x64bbd]  __rust_maybe_catch_panic+0x1d
C  [java_array-e9efbd105599ce18+0x22a93]  _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hdc66c7bb936b9796+0x103
C  [java_array-e9efbd105599ce18+0x5cb8c]  std::sys::imp::thread::Thread::new::thread_start::hf0e7be833820328e+0x8c
C  [libpthread.so.0+0x76ba]  start_thread+0xca

So, in JvmObject::from_jvm_ptr the call jni_NewGlobalRef(env, jvm_ptr) checks the jvm_ptr (of type jobjectArray) and raises the error.

kud1ing commented 6 years ago

They say jmethodID doesn't need a global reference, so I think this is the reason we don't need the jvm_wrapper!().

Yes, you are correct. A method is not a jobject. I was confused. I should write this down.

kud1ing commented 6 years ago

compared to other C++ JNI examples, in Rucaja we do a lot of unnecessary attach/detach.

I don't know. Rucaja's functions should in theory be Rust-thread-safe. Do C++-thread-safe functions look any different?

I could imagine we could do more work in fewer functions. Like calling a method: we could do look up of the class and the method and calling the method at the same time. But this would waste some cycles when looking up the same classes/methods multiple times.

kud1ing commented 6 years ago

I suspect that a jclass can't survive a DetachCurrentThread

Since we create a global reference to them, i would expect that this is not the case. But i have not seen any statement one way or another.

kud1ing commented 6 years ago

Maybe we just call jni_NewGlobalRef() too late?

fpoli commented 6 years ago

compared to other C++ JNI examples, in Rucaja we do a lot of unnecessary attach/detach.

I don't know. Rucaja's functions should in theory be Rust-thread-safe. Do C++-thread-safe functions look any different?

I phrased it in the wrong way, I wanted to say that in the C++ examples that I looked at they don't detach before calling the NewGlobalRef.

Maybe we just call jni_NewGlobalRef() too late?

Yes, I also suspect that anticipating jni_NewGlobalRef() may be a solution. It seems to be consistent with this explanation.

If my understanding is correct, it seems that local references must have a lifetime shorter than the JvmAttachment that created them, while global references can have a longer lifetime, up to the lifetime of the Jvm.

However, there are still a lot of things that are not clear to me. For example, what happens if we attach twice and detach once? Can we still use a local reference created before the detach? If we want to ignore this kind of questions I think that It's much easier to convert everything to global reference (before the detach) or to use a single JvmAttachment for the whole thread (I'm not sure how).

kud1ing commented 6 years ago

For example, what happens if we attach twice and detach once?

I think this is not a problem:

// If the thread has been attached this operation is a no-op

https://github.com/dmlloyd/openjdk/blob/28cf61ba0955833f75cf7ba2b80cf83c557300d5/src/hotspot/share/prims/jni.cpp#L4128

fpoli commented 6 years ago

So, imagine that we have a function foo that uses an outer:JvmAttachment. Inside it, we call a function bar that internally creates and uses another inner:JvmAttachment. When the inner gets dropped it detaches the current thread. outer is not aware of this, and using it may no longer be safe. Am I right?

fn bar() {
    let inner :JvmAttachment = ... // attach to current thread (no-op)
    // inner is dropped here, and detaches the current thread
}

fn foo() {
    let outer :JvmAttachment = ... // attach to current thread
    bar();
    // here we use outer, or some local reference created with outer. However the thread has been detached
    // outer is dropped (no-op?)
}

It seems to me that is easier to have just one jni_env in each thread, with one "attach" at the beginning of the thread.

kud1ing commented 6 years ago

Yes, that makes sense. We should modify every function to accept a reference to a JvmAttachment instead of instantiating it themselves.

kud1ing commented 6 years ago

Hm, for Drop() in https://github.com/kud1ing/rucaja/blob/master/src/macros_jvm_wrappers.rs we need an JvmAttachment and we can not pass it as an argument. I think struct member references to Jvm should be replaced with references to JvmAttachment instead.

fpoli commented 6 years ago

I just found this library https://github.com/prevoty/jni-rs/blob/980cf3e23fdcef7302e864011101c3d65a6d3843/src/wrapper/objects/global_ref.rs

They keep a *mut JNIEnv inside the structure of the global reference, so that they can later call DeleteGlobalRef.

Actually, jni-rs overlaps a lot with rucaja

kud1ing commented 6 years ago

Some time ago i've tried to cache the JNIEnv and it crashed a lot, especially in CI. I think the documentation creation steps uses threads. This is why i've introduced JvmAttachment.

See also https://stackoverflow.com/a/12421377

Also my understanding was that the purpose of jni-rs is to provide Rust functions to Java, not vice versa.

kud1ing commented 6 years ago

Fixed with https://github.com/kud1ing/rucaja/commit/4a93eefa5c9b8dd5b09be1f21bf764f446d52c3a

kud1ing commented 6 years ago

Thanks for the test and your help in identifiying and solving this problem.