Closed sagebind closed 1 year ago
For me its happening when Android Service
is running.
Ah, right, oops!
Your fix generally looked good to me but one notable thing was that it looked like it could theoretically enable a different deadlock if the the spawned thread for android_main
returned immediately then the wait for running
to get set could block the Java main thread indefinitely.
I replaced running
with a tristate enum to avoid that possibility.
Although it's redundant I've currently kept the destroyed
field which is now set immediately when onDestroyed
is called from the java main thread.
It would be great if you're able to double check that this updated PR works for you @sagebind and @ardocrat
@rib Good catch, thanks. Your patch does look better, and I just tested it and it works just as well in my app. I wasn't totally confident in my implementation since I'm not familiar with the codebase (and new to Android in general), which is why I started with a draft PR.
Thanks for testing @sagebind
It'd be good to make a patch release with this fix, and to do that before landing the GameActivity 2.0.2 update
Sadly not fixed at my case.
At logs I can see deadlock happened at pthread_cond_wait(&android_app->cond, &android_app->mutex);
of android_app_free
function from android_native_app_glue.c
file and no logs from Rust glue.rs
file.
So as temporary dirty fix I am just killing my process after some time to relaunch application later without problems :)
@ardocrat ah okey, if you're seeing a reference to android_native_app_glue.c
that implies you're using the game-activity
backend which wouldn't be affected by the fix here currently which only touches the native-activity
backend.
I guess there must be a similar bug in the game-activity
backend.
@ardocrat I can't currently see an equivalent bug in the game-activity backend.
In the game-activity backend it blocks in android_app_free
(where you see the deadlock) waiting for the native (Rust) thread to respond to onDestroy with:
pthread_mutex_lock(&android_app->mutex);
android_app_write_cmd(android_app, APP_CMD_DESTROY);
while (!android_app->destroyed) {
pthread_cond_wait(&android_app->cond, &android_app->mutex);
}
pthread_mutex_unlock(&android_app->mutex);
and the thing that sets android_app->destroyed
is the android_app_destroy
function which should get called by android_app_entry
immediately after _rust_glue_entry
(i.e. after the app's android_main
returns):
static void* android_app_entry(void* param) {
// <snip>
_rust_glue_entry(android_app); // (calls android_main)
android_app_destroy(android_app); // <- should set app->destroyed = 1 to unblock android_app_free
return NULL;
}
static void android_app_destroy(struct android_app* android_app) {
LOGV("android_app_destroy!");
free_saved_state(android_app);
pthread_mutex_lock(&android_app->mutex);
AConfiguration_delete(android_app->config);
android_app->destroyed = 1;
pthread_cond_broadcast(&android_app->cond);
pthread_mutex_unlock(&android_app->mutex);
// Can't touch android_app object after this.
}
Are you maybe able to add a bit of logging to confirm that your applications android_main
is returning?
Thank you @rib, I am using eframe
, I forgot to call frame.close()
to unblock android_main
, but another problem happened here, eframe.close()
method is leading to call of Activity's onPause()
method and its impossible to call finish()
after this, cause Activity is at background, moreover its blocking app thread for 5 seconds and if I am opening app between moment of exit and this timeout (5 seconds) I am getting ANR popup to force stop the app. At logs I am seeing:
18:07:29.004 D ************** mainWorkCallback *********
18:07:29.004 D mainWorkCallback: cmd=1
18:07:29.004 V android_app_destroy!
18:07:29.031 D onPause_native
18:07:29.031 V Pause: 0x722f923370
18:07:34.222 I Thread[6,tid=8884,WaitingInMainSignalCatcherLoop,Thread*=0x72af9196f0,peer=0x137c0110,"Signal Catcher"]: reacting to signal 3
18:07:34.222 I
18:07:34.742 I Wrote stack traces to tombstoned
---------------------------- PROCESS ENDED (8874) for package com.example.android ----------------------------
I guess this problem is related to eframe's close()
method, it is pausing Activity by calling this method at same time on app destroy.
ah, curious.
The android lifecycle model (https://developer.android.com/guide/components/activities/activity-lifecycle) will go through onPause and onStop before reaching onDestroy.
From your log I see that it looks like android_app_destroy
is called before onPause_native
The android_app
isn't really in a usable state after android_app_destroy
.
The comment at the end of android_app_destroy
actually says // Can't touch android_app object after this
though since it doesn't ever technically actually free()
the android_app
we should at least still be able to check app->destroyed
In the NativeActivity backend we have try_with_waitable_activity_ref
utility that generalizes upgrading a weak reference for the state that may get dropped when the app exits so any Activity
notifications afterwards become safe noops.
It looks like the GameActivity glue is missing these kinds of checks to consider that the app may have already destroyed when an Activity native method (like onPause_native) gets called.
We can hopefully address this with a check for app->destroyed in GameActivity.cpp:onPause_native
, and should probably have similar checks for other native callbacks too.
@rib At this log I tried to manually finish it before calling frame.close()
. if I am not calling Activity::finish()
manually, this call is coming after android_main
, but leads to same problem, cause Activity
can not be finished at background. eframe
puts Activity
at pause state after calling close()
.
looking at the eframe implementation for close() it seems to just set a flag that results in Winit setting control_flow::Exit
which should cause the winit event loop to exit.
Changing exactly where you finish()
won't necessarily make much difference since GameActivity_finish
just sends an async message to the java main thread.
My guess at the moment is that calling eframe.close()
is working as expected and resulting in your android_main
then returning normally which will result in calling GameActivity_finish
(when sends an async message to main thread) and then android_app_destroy
gets calls which leaves the android_app
in an inconsistent state.
The async finish message can then trigger onPause -> onStop -> onDestroy
lifecycle events but by the time onPause
runs then android_app
is already in an inconsistent state, and (more importantly) the native thread for android_main has exit.
because onPause_native
doesn't recognize that the app has already been destroyed it is then going to try and send a message from the java main thread to your rust/native thread (which won't work because the Rust thread has already exit).
I think actually looking at this a little closer we can't address this directly in onPause_native
because the GameActivity/NativeCode
class doesn't actually know anything about the implementation details of android_app
in android_native_app_glue.c
(it just calls the callbacks provided).
I think the right fix here is to probably make android_app->destroyed
be a value we read with atomic ops (or consistently read with the android_app->mutex held) and then each android_app Java callback (like onPause
) should start with a check of android_app->destroyed and immediately return if its set.
It does give me some renewed motivation to think about starting to port GameActivity to Rust though. This same issue almost certainly used to exist in the old NativeActivity backend too when it was written in C.
@ardocrat I wonder if you could try out this local patch to android_app_set_activity_state
:
static void android_app_set_activity_state(struct android_app* android_app,
int8_t cmd) {
pthread_mutex_lock(&android_app->mutex);
- android_app_write_cmd(android_app, cmd);
- while (android_app->activityState != cmd) {
- pthread_cond_wait(&android_app->cond, &android_app->mutex);
+ if (!android_app->destroyed) {
+ android_app_write_cmd(android_app, cmd);
+ while (android_app->activityState != cmd) {
+ pthread_cond_wait(&android_app->cond, &android_app->mutex);
+ }
}
pthread_mutex_unlock(&android_app->mutex);
}
that doesn't account for all callbacks but should make onPause
and onStop
gracefully return if the app has already exit.
You might also need this change for android_app_free
to make onDestroy
also act gracefully:
static void android_app_free(struct android_app* android_app) {
pthread_mutex_lock(&android_app->mutex);
+
+ if (!android_app->destroyed) {
+ pthread_mutex_unlock(&android_app->mutex);
+ return;
+ }
+
android_app_write_cmd(android_app, APP_CMD_DESTROY);
while (!android_app->destroyed) {
pthread_cond_wait(&android_app->cond, &android_app->mutex);
I can also aim to push a branch that you can try out with those fixes and more for the other callbacks, but just in the middle of some other android-activity changes atm and don't want to fully switch context just yet.
It's probably also worth noting that there's this PR that's looking to update the GameActivity backend and I'll also look at porting these changes to that branch if they help: https://github.com/rust-mobile/android-activity/pull/88
It moved further, now I can see onSurfaceDestroyed_native
and NativeWindowDestroyed
events at logs, but sadly only onPause
was called from Activity
lifecycle and it was not finished, so when I am opening back the app, I can see only black screen, without ANR (Application Not Responding) popup now.
ah, okey, yeah I though you might get away with testing by just handling android_app_set_activity_state but yeah, android_app_set_window is another api that's called by one of the java callbacks that needs a guard. I made the changes locally but will try and share a branch soon.
For android_app_set window
I added:
@@ -308,6 +308,10 @@ static void android_app_set_window(struct android_app* android_app,
ANativeWindow* window) {
LOGV("android_app_set_window called");
pthread_mutex_lock(&android_app->mutex);
+ if (!android_app->destroyed) {
+ pthread_mutex_unlock(&android_app->mutex);
+ return;
+ }
if (android_app->pendingWindow != NULL) {
android_app_write_cmd(android_app, APP_CMD_TERM_WINDOW);
}
@ardocrat I've just created this issue to track the game-activity deadlock: https://github.com/rust-mobile/android-activity/issues/97
I'll treat it as a separate issue for now, so we can merge this fix as is first.
Fix a deadlock that occurs when an activity is destroyed without process termination, such as when an activity is destroyed and recreated due to a configuration change.
The deadlock occurs because
notify_destroyed
blocks untildestroyed
is set totrue
. This only occurs whenWaitableNativeActivityState
is dropped, but theWaitableNativeActivityState
instance is the very thing being used to await for destruction, resulting in a deadlock.This doesn't really make much sense, so instead, wait for the Rust-side main thread to terminate to ensure cleanup can proceed.