google / marl

A hybrid thread / fiber task scheduler written in C++ 11
Apache License 2.0
1.89k stars 193 forks source link

JNI from Fibers #240

Closed guusw closed 1 year ago

guusw commented 1 year ago

Hey I recently ran into an issue trying to call use JNI functions from within fibers, I recall being able to use them in the past but recent android versions (12) seem to fail. I have a very simple test case where some SDL functions interact with it's corresponding java code from a Fiber: https://gist.github.com/guusw/d32e473838c852c1a1cdc233be873ed0

The program runs on windows for example, but running it on my android device it results in the following: https://gist.github.com/guusw/875eee24650de245b35dac1fd6eefce9

Ignore that fact that it's complaining about a stack overflow (since it's trying to dump the crash stack), it fails before that in CheckJNI. I've tried something similar with boost context and it gives the actual error message (below), but fails to run as well .

03-09 13:51:31.776 10912 10939 F fragcolor.acor: java_vm_ext.cc:594] JNI DETECTED ERROR IN APPLICATION: JNI ERROR (app bug): jstring is an invalid JNI transition frame reference or invalid reference: 0xb400007630e49ec0 (use of invalid jobject)
03-09 13:51:31.776 10912 10939 F fragcolor.acor: java_vm_ext.cc:594]     in call to GetStringUTFChars
03-09 13:51:31.776 10912 10939 F fragcolor.acor: java_vm_ext.cc:594]     from int android.util.Log.println_native(int, int, java.lang.String, java.lang.String)
03-09 13:51:31.793 10912 10933 D hw-ProcessState: Binder ioctl to enable oneway spam detection failed: Invalid argument

I'm trying to find a solution for this without having to change my code that relies on fibers but have not been able to figure it out this far, seems like the new android runtime tries to trace the stack when it's looking at java object references.

AWoloszyn commented 1 year ago

So I have run into this exact issue before. JNI now just has a check to try and make sure the stack has not been corrupted. If I remember the details correctly, there is some data stored off in TLS about the base pointer and stack pointer whenever a thread is created. When we create a fiber these values obviously are not correct for the new stack location. This is not really something that has a clean solution, as these internal details are subject to change.

My solution to this problem was to have a thread whose job it is to call JNI methods, and dispatch work to that thread. However a simpler (if less efficient) method would just be to wrap your JNI calls. Something like:

std::thread(....[]() {
   // Do your JNI stuff in here.
}).join();

It entirely depends on how many of these calls you have to make.

MacOS has a similar issue where all windows management calls have to be made from the main thread, so anything in a fiber has to be shunted to the main thread (not just ANY thread) in order to run.

ben-clayton commented 1 year ago

marl::blocking_call() does what @AWoloszyn suggested, while freeing up the current thread for other tasks: https://github.com/google/marl/blob/2e82e6999f4947cb2d682dd1b8c636928397c578/include/marl/blockingcall.h#L98

Unfortunately spawning and killing threads is going to be costly in terms of performance.

ben-clayton commented 1 year ago

Thread is inactive. Closing.