tmo1 / sms-ie

SMS Import / Export is a simple Android app that imports and exports SMS and MMS messages, call logs, and contacts from and to JSON / NDJSON files.
GNU General Public License v3.0
319 stars 38 forks source link

ExportWorker gets killed by Android 14 due to new excessive binder call tracker feature #129

Closed chenxiaolong closed 8 months ago

chenxiaolong commented 9 months ago

On Android 14 (Google Pixel stock firmware), the scheduled exports seem to be crashing, even though exports done manually via the UI work fine. When ExportWorker tries it run, it fails during the message export phase with a binder error:

--------- beginning of main
2023-10-21 00:03:56.047852 -0400 I/hub.tmo1.sms_ie(12068): Using CollectorTypeCC GC.
2023-10-21 00:03:56.054832 -0400 E/hub.tmo1.sms_ie(12068): Not starting debugger since process cannot load the jdwp agent.
2023-10-21 00:03:56.071151 -0400 D/CompatibilityChangeReporter(12068): Compat change id reported: 171979766; UID 10302; state: ENABLED
2023-10-21 00:03:56.071324 -0400 D/CompatibilityChangeReporter(12068): Compat change id reported: 242716250; UID 10302; state: ENABLED
2023-10-21 00:03:56.079462 -0400 D/ApplicationLoaders(12068): Returning zygote-cached class loader: /system_ext/framework/androidx.window.extensions.jar
2023-10-21 00:03:56.079519 -0400 D/ApplicationLoaders(12068): Returning zygote-cached class loader: /system_ext/framework/androidx.window.sidecar.jar
2023-10-21 00:03:56.091642 -0400 D/nativeloader(12068): Configuring clns-6 for other apk /data/app/~~-0i8XvqMvE7-tBm7q61ghw==/com.github.tmo1.sms_ie-YdWZ4xiIg2BCTfZVOeQIKQ==/base.apk. target_sdk_version=34, uses_libraries=, library_path=/data/app/~~-0i8XvqMvE7-tBm7q61ghw==/com.github.tmo1.sms_ie-YdWZ4xiIg2BCTfZVOeQIKQ==/lib/arm64, permitted_path=/data:/mnt/expand:/data/user/0/com.github.tmo1.sms_ie
2023-10-21 00:03:56.104880 -0400 V/GraphicsEnvironment(12068): Currently set values for:
2023-10-21 00:03:56.105009 -0400 V/GraphicsEnvironment(12068):   angle_gl_driver_selection_pkgs=[]
2023-10-21 00:03:56.105040 -0400 V/GraphicsEnvironment(12068):   angle_gl_driver_selection_values=[]
2023-10-21 00:03:56.105567 -0400 V/GraphicsEnvironment(12068): ANGLE GameManagerService for com.github.tmo1.sms_ie: false
2023-10-21 00:03:56.105675 -0400 V/GraphicsEnvironment(12068): com.github.tmo1.sms_ie is not listed in per-application setting
2023-10-21 00:03:56.105793 -0400 V/GraphicsEnvironment(12068): Neither updatable production driver nor prerelease driver is supported.
2023-10-21 00:03:56.161398 -0400 D/CompatibilityChangeReporter(12068): Compat change id reported: 160794467; UID 10302; state: ENABLED
2023-10-21 00:03:56.175046 -0400 D/CompatibilityChangeReporter(12068): Compat change id reported: 263076149; UID 10302; state: ENABLED
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068): Work [ id=48895fe2-ca9c-4a5a-956b-9ab885fb34ea, tags={ com.github.tmo1.sms_ie.ExportWorker, export } ] was cancelled
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068): java.util.concurrent.CancellationException: Task was cancelled.
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at y1.h.e(SourceFile:33)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at y1.h.get(Unknown Source:21)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at androidx.appcompat.widget.j.run(SourceFile:125)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at androidx.appcompat.widget.j.a(SourceFile:1)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at androidx.appcompat.widget.j.run(SourceFile:10)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
2023-10-21 00:03:56.226278 -0400 I/WM-WorkerWrapper(12068):     at java.lang.Thread.run(Thread.java:1012)
2023-10-21 00:03:56.248936 -0400 D/CompatibilityChangeReporter(12068): Compat change id reported: 194532703; UID 10302; state: ENABLED
2023-10-21 00:03:56.249372 -0400 D/CompatibilityChangeReporter(12068): Compat change id reported: 253665015; UID 10302; state: ENABLED
2023-10-21 00:03:56.295989 -0400 I/SMSIE   (12068): Beginning messages export ...
2023-10-21 00:03:56.380188 -0400 E/JavaBinder(12068): !!! FAILED BINDER TRANSACTION !!!  (parcel size = 352)
--------- beginning of system
2023-10-21 00:03:56.380548 -0400 I/ActivityThread(12068): Removing dead content provider:android.content.ContentProviderProxy@4cf8418
2023-10-21 00:03:56.380592 -0400 I/ActivityThread(12068): Removing dead content provider:android.content.ContentProviderProxy@4cf8418

Manually triggering with scheduled job with the following command also results in the same crash:

adb shell cmd jobscheduler run -f com.github.tmo1.sms_ie 8

When the crash occurs, it leaves behind a partially written zip file with just the local zip entry header for messages.ndjson:

❯ zipdetails -vv --walk messages-2023-10-21.zip

00000 00003 00004 50 4B 03 04 LOCAL HEADER #1       04034B50 (67324752)
00004 00004 00001 14          Extract Zip Spec      14 (20) '2.0'
00005 00005 00001 00          Extract OS            00 (0) 'MS-DOS'
00006 00007 00002 08 08       General Purpose Flag  0808 (2056)
                              [Bits 1-2]            0 'Normal Compression'
                              [Bit  3]              1 'Streamed'
                              [Bit 11]              1 'Language Encoding'
00008 00009 00002 08 00       Compression Method    0008 (8) 'Deflated'
0000A 0000D 00004 7C 00 55 57 Last Mod Time         5755007C (1465188476) 'Fri Oct 20 20:03:56 2023'
0000E 00011 00004 00 00 00 00 CRC                   00000000 (0)
00012 00015 00004 00 00 00 00 Compressed Length     00000000 (0)
00016 00019 00004 00 00 00 00 Uncompressed Length   00000000 (0)
0001A 0001B 00002 0F 00       Filename Length       000F (15)
0001C 0001D 00002 00 00       Extra Length          0000 (0)
0001E 0002C 0000F 6D 65 73 73 Filename              'messages.ndjson'
                  61 67 65 73
                  2E 6E 64 6A
                  73 6F 6E
#
# ERROR: Local Directory exists, but Central Directory not found
#
# Error Count: 1
#
# Done

I'm not sure why this is happening yet. I'll see if I can reproduce the problem this weekend with a debug build to get a better stack trace. (The logs above are from the precompiled 2.2.0 APK from the Github release page.)

chenxiaolong commented 9 months ago

Hmm, it looks like the app is getting killed for doing too many binder calls in the background:

10-21 16:09:13.031  1452  2008 D ActivityManager: freezing 30385 com.github.tmo1.sms_ie.debug
10-21 16:09:13.031  1452  2008 D ActivityManager: Kill app due to repeated failure to freeze binder: 30385 com.github.tmo1.sms_ie.debug
10-21 16:09:13.032  1452  1749 I ActivityManager: Killing 30385:com.github.tmo1.sms_ie.debug/u0a312 (adj 900): excessive binder traffic during cached
10-21 16:09:13.041  1452  1767 I libprocessgroup: Successfully killed process cgroup uid 10312 pid 30385 in 5ms

This behavior was introduced in Android 14 in commit https://android.googlesource.com/platform/frameworks/base.git/+/71d75c09b9a06732a6edb4d1488d2aa3eb779e14%5E%21/. I'll dig around and see if there's a way this can be worked around.

tmo1 commented 9 months ago

Hi,

Thank you for reporting this, and thank you very much for taking the time and making the effort to figure out the problem and create a (well-documented!) workaround. You're clearly a more advanced Android developer than I am :)

I would like to understand the problem better, though, before merging a workaround like your PR: what, exactly, is the app doing wrong? I thought that doing high volume and long running database queries and file I/O is exactly the sort of thing that background work is designed for. What is the "excessive binder traffic" that Android is complaining about? Perhaps if I understood the problem better, I could redesign the scheduled export code to be more compliant with what Android expects.

chenxiaolong commented 9 months ago

Sure thing!

I don't think sms-ie is doing anything incorrectly. The problem is mainly in these two calls:

https://github.com/tmo1/sms-ie/blob/c56fa0e7d65cf44b69e3698bd3fe9e38a81850a9/app/src/main/java/com/github/tmo1/sms_ie/ImportExportMessages.kt#L179-L181

https://github.com/tmo1/sms-ie/blob/c56fa0e7d65cf44b69e3698bd3fe9e38a81850a9/app/src/main/java/com/github/tmo1/sms_ie/ImportExportMessages.kt#L229-L233

Each of those queries involves inter-process communication between sms-ie and Android's MmsProvider component, which performs a binder transaction under the hood. While the second query could potentially optimized so that it's only done once (and mid is filtered client side), I don't think the same optimization can be done for content://mms/$msgId/addr because it's querying a different URI each time. There doesn't seem to be a content://mms/addr for querying everything at once: https://cs.android.com/android/platform/superproject/main/+/main:packages/providers/TelephonyProvider/src/com/android/providers/telephony/MmsProvider.java;l=1163;drc=c89bafa3f22a9dce56adb9b7fab2f9149d18c456.

In my case, I have 5000 or so MMS messages (well, RCS), so it ends up performing 10000+ queries. It takes around 4 minutes to export and Android usually kills the process about 1 minute 15 seconds in (~3k queries).

I thought that doing high volume and long running database queries and file I/O is exactly the sort of thing that background work is designed for.

That's unfortunately no longer true in recent Android versions. There are special exceptions to this though, like downloading files, but in the general case, without using a foreground service, it is very likely that long-running (multiple minute) operations will be killed.

tmo1 commented 9 months ago

Ah, I think I understand - thanks for the detailed explanation.

Rant:

I find Android development really aggravating. Everything is constantly changing, and unlike in Linux, where (with apologies to Perl) easy things are generally easy and hard things are usually at least possible, in Android it feels like easy things are generally hard and lots of things turn out to be impossible.

The sheer amount of rules and restrictions imposed by the system just feels developer hostile, and while I understand that the purpose is to protect users and provide them with good user experiences, the result seems to be an unfortunate lack of availability of quality apps for those same users. E.g.: on Linux, there are innumerable backup applications, ranging from simple and easy to use to complex and powerful, from CLI to GUI, while (due to Android design decisions) there's literally not one decent generally usable backup app for normal non-rooted Android devices (which is in part why SMS I/E is needed in the first place).

To be fair, I'm just an amateur, hobbyist developer, and perhaps this is not an entirely fair and accurate assessment of the state of things on Android, but this is what the situation feels like to me

End rant.

Thank you again very much for your work on this, though. I'll do some testing, and if everything looks good, I'll merge the PR. (You mention not having a telephony-capable Android <= 13 device for testing - did you try testing on a pre-14 AVD?)

chenxiaolong commented 9 months ago

I 100% agree with your rant and feel the same way. Usable Linux phones can't come soon enough.

(You mention not having a telephony-capable Android <= 13 device for testing - did you try testing on a pre-14 AVD?)

I wasn't aware they had the telephony components. I'll give that a try and report back.

chenxiaolong commented 9 months ago

I tested on a handful of versions via AVD:

tmo1 commented 8 months ago

I found a crash when opening the settings screen and pushed a fix for it in my branch: https://github.com/tmo1/sms-ie/commit/c4aa39c7c1bab07ba93eef4519640598b2d7c303.

Thanks! I'm not sure why I didn't hit that crash when I tested the API 19 code, but I did reproduce it now, and confirmed that your fix fixes it.

App is likely to be killed when exporting many MMS messages via the background service. Runs reliably via the foreground service.

Thank you again very much for all the work, testing, and documenting you've done on this!

I've reviewed the code in your patch, and I think I understand what it's doing, but before I merge it, I'm trying to reproduce the crash you're seeing with the old code (and the patched code with battery optimizations disabled) and confirm that the fix works, but so far I haven't been able to reproduce the crash. I just did a scheduled export of 15400 MMS messages (on a Pixel 7 API 34 AVD) and it worked fine. Do you have any ideas as to how I can trigger the crash?

chenxiaolong commented 8 months ago

No problem at all!

Do you have any ideas as to how I can trigger the crash?

I forgot to mention this in my last post. I can reproduce the crash reliably, even after a factory reset, on a Pixel 8 Pro nearly every time, but for whatever reason, never in the AVD. In both cases, I restored from the same backup (~5000 MMS messages) and triggered a scheduled export.

I'm not sure if Android's CachedAppOptimizer is disabled in the emulator or if there's something that prevents it from running and killing apps. I know certain actions will disable it temporarily, like adb shell pm dump <package>. There's a CachedAppOptimizer.dump() function that would print out a lot of helpful statistics about what's going on, but I can't find a good way to call it without doing something hacky with root access.

tmo1 commented 8 months ago

Do you have any ideas as to how I can trigger the crash?

I realized that when I tested, I left the app in the foreground, so that's likely why the killing / freezing behavior did not occur :|

I did a bunch more tests with the app removed from the foreground, on API 33 and 34 (all on AVDs), and I observed the following:

So it looks like your patch is definitely necessary for the reliable running of large scheduled exports, whether to avoid the binder issue or the "excessive cpu" / long-running service problem, but I wish I had a better understanding of all this.

Another, likely unrelated question: when doing all this testing, I noticed that the notifications that the app is supposed to show upon completion of a scheduled export run are not being displayed (in 33 or 34). This used to work, and I'm not sure when / why it no longer does. Did you notice the completion notifications in your testing?

chenxiaolong commented 8 months ago

Thanks for the additional testing!

Another, likely unrelated question: when doing all this testing, I noticed that the notifications that the app is supposed to show upon completion of a scheduled export run are not being displayed (in 33 or 34). This used to work, and I'm not sure when / why it no longer does. Did you notice the completion notifications in your testing?

This is because sms-ie currently doesn't use the POST_NOTIFICATIONS permission, which must be declared in the manifest and also requested at runtime as of API 33. Maybe sms-ie could prompt for that permission when enabling scheduled exports?

tmo1 commented 8 months ago

Maybe sms-ie could prompt for that permission when enabling scheduled exports?

Sounds right - I'll try it.

tmo1 commented 8 months ago

Maybe sms-ie could prompt for that permission when enabling scheduled exports?

Done - thanks!