Closed Rochet2 closed 6 months ago
Have you enabled logging with android_logger
and hook on panic? That will show the error in details.
Here are the related codes in my app:
Couldnt get your panic to work quite the same due to various compilation errors. Edit: got it working, but it had no difference in the log.
However with the following changes that I made to logging
branch of the repro repo: https://github.com/Rochet2/wasmer_android_jnicall_repro/commit/6a8de4eda2f5fdc41a6ed832de805655e696780e
I get the following log:
In the log we can see that the two function calls before entering WASM work and print Test was called from WASM!
However the third print done from the host function fails. The function is the same and uses same variables.
The error messages about unwrap occur due to java exception that occurs during the function call - even if the call should be identical to the ones that succeeded.
src/lib.rs:85 is https://github.com/Rochet2/wasmer_android_jnicall_repro/blob/6a8de4eda2f5fdc41a6ed832de805655e696780e/android_runtime/src/lib.rs#L85-L86
and src/lib.rs:51 is https://github.com/Rochet2/wasmer_android_jnicall_repro/blob/6a8de4eda2f5fdc41a6ed832de805655e696780e/android_runtime/src/lib.rs#L51
From the above log we can see that the succeeding and failing calls have identical apart from failing call getting some exception in the middle of the call and never printing.
Here is a diff of a successful call (success.log) and the failing call (fail.log) showing how they look the same in the log apart from an exception that seems to occur during the call from host function:
--- success.log 2020-04-15 13:52:39.813318990 +0300
+++ fail.log 2020-04-15 13:52:49.757414699 +0300
@@ -15,12 +15,12 @@
calling checked jni method: CallVoidMethodA
looking up jni method CallVoidMethodA
found jni method
-I/System.out: Test was called from WASM!
checking for exception
calling unchecked jni method: ExceptionCheck
looking up jni method ExceptionCheck
found jni method
- no exception found
+ exception found, returning error
calling unchecked jni method: DeleteLocalRef
looking up jni method DeleteLocalRef
found jni method
+E/wasmer_android: ERR: panicked at 'called `Result::unwrap()` on an `Err` value: Error(JavaException, State { next_error: None, backtrace: InternalBacktrace })', src/lib.rs:85:5
You can throw the exception from Rust code by JNIEnv::throw
.
And I made this patch to use it to see the exception because I'm also very interested in this issue!
Now, the log message reports an error exactly same with https://github.com/iij-ii/wasmonandroid/issues/1: StackOverflowError
occurred.
Extracted from the log:
...
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: found jni method
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: exception found, returning error
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: calling unchecked jni method: DeleteLocalRef
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: looking up jni method DeleteLocalRef
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: found jni method
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: calling unchecked jni method: ExceptionOccurred
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: looking up jni method ExceptionOccurred
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: found jni method
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: calling unchecked jni method: Throw
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: looking up jni method Throw
2020-04-16 16:17:28.950 23545-23545/com.wasmer.android V/jni::wrapper::jnienv: found jni method
2020-04-16 16:17:29.114 23545-23545/com.wasmer.android A/.wasmer.androi: java_vm_ext.cc:561] JNI DETECTED ERROR IN APPLICATION: JNI Throw called with pending exception java.lang.StackOverflowError: stack size 8MB
2020-04-16 16:17:29.114 23545-23545/com.wasmer.android A/.wasmer.androi: java_vm_ext.cc:561] at void com.wasmer.android.MainActivity.JNIExecuteWasm(com.wasmer.android.MainActivity, byte[]) (MainActivity.java:-2)
...
As written in https://github.com/iij-ii/wasmonandroid/issues/1, I thought this was because my code consumes stack too much until getting this issue. Now I suspect this is a problem of wasmer.
@MarkMcCaskey Any ideas of why wasmer consumes the stack too much?
Sorry, this can be a known issue of Android, according to https://stackoverflow.com/questions/27223005/unable-to-make-jni-call-from-c-to-java-in-android-lollipop-using-jni and https://issuetracker.google.com/issues/37035211.
Progress report of debugging: To show log while running wasmer, I switched the logger to write on the application data: https://github.com/igrep/wasmer_android_jnicall_repro/commit/6387f88f57a3e1b61516a302931ace71d8fef5ad
Now, we can see the log written by libwasmer_android.so with this command after running the app:
adb shell run-as com.wasmer.android cat files/wasmer_android.log
I'm going to add debug codes to wasmer from now on.
Progress report:
I tried two ways to fix after adding loggings:
ERR
is logged after calling wasm. See the commit message for details.A bit out of my expertise, but based on the comments in the threads you linked earlier and based on
I would say that the native function call should be moved to some other context which is outside of this signal handling if possible - if that indeed is the reason for the issue. Creating new threads is not a proper solution and some of the posts hint that it would still be wrong to do so even if it worked as it is fundamentally wrong place to call a function anyways. Though maybe doing so can help debugging.
There must be something that wasmer does to create this situation where it is not possible to call java as the java calls normally work outside of wasmer. Relocating the native call to outside of that or removing the error creating code or correcting it if done wrong would be the solution in my mind.
ps. really nice stuff here. definitely taking some of this logging and error handling to use.
Thanks for figuring this out! It looks like our signal handling code is not built to be used outside of a wasm context.
I'm not familiar with the details here at all, but it seems highly plausible that the signal handling of Wasmer and the Android runtime could be interacting poorly.
This seems like a problem that would show up in quite a few places actually, I'm not sure what the standard solutions are for conflicting signal handling code... I'll look into this later if I have the bandwidth and post any ideas I have!
After browsing the source code of Android and wasmer, I suspect this is just because wasmer alters sp
to point at mmap
ed area in AArch64, while doesn't do that in x86_64, if I read the assembly correctly.
Some very plausibly named method validates the stack frame to throw a StackOverflowError
if invalid. And there are several other lines like that.
I guess the error is caused by the guard.
Now, I'm interested in why the singlepass backend for AArch64 is implemented in such a way, and how to fix it?
We are planning a new version of the Singlepass compiler to support aarch 32 and 64bits. See #2463.
Also, I would like to point this wasmer-java
embedding, https://github.com/wasmerio/wasmer-java/.
This should be already fixed on master, it needs a test case to prove that
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Feel free to reopen the issue if it has been closed by mistake.
Describe the bug
I am trying to run wasmer embedded in an android app and call android app functions from WASM. This succeeds on x86_64 emulator, but not on an ARM phone. When an android class instance's method is being called from a host function exposed to WASM the program aborts.
I have attached a reproduction case where there is a WASM module, an android wasmer runtime and an android project that uses the wasm module and wasmer runtime. The application loads the wasmer runtime as a shared library. It then loads the wasm module from assets folder as byte array and calls a function of the wasmer runtime through JNI. In this call the runtime gets the module bytes, the instance of the java class and java environment. The function will load the wasm module so it is ready to execute, shows how a test JNI call works outside of wasmer runtime, places java class and vm references to global variables and runs the wasm module. The wasm module simply executes the Test host function (named java_test in rust) in it's main function. The host function gets the java environment and class instance from the global variables and uses them to do a function call into Java again. This call should be the exact same as the JNI function call performed earlier.
The running of the wasm module works on x86_64, but not on a real phone using aarch64 build. The JNI function call succeeds outside of wasmer code, but not inside the wasmer host function.
Steps to reproduce
Expected behavior
The program runs without aborting on both x86_64 and aarch64.
Log of x86_64 succeeding (program finishes cleanly and all prints appear):
Actual behavior
The program works on x86_64, but aborts during a JNI call in a host function on aarch64.
Log of aarch64 failing (last 3 lines show how the call aborted):
Additional context
Pretty new to android, JNI and wasm so it is entirely possible I have a simple logical error in my code. However the code working on x86_64 suggests that might not be the case. Unsure if this is an issue in my code, JNI code or wasmer code, but it only appears in wasmer host function.