wordpress-mobile / WordPress-Android

WordPress for Android
http://android.wordpress.org
GNU General Public License v2.0
2.92k stars 1.3k forks source link

Crash when drawing the editor on devices running Android 8 #8828

Closed jkmassel closed 4 years ago

jkmassel commented 5 years ago

It appears we're getting a crash specific to Android devices with Android 8.x, prevalently Huawei devices.

The relevant bits of the stack trace:

Fatal Exception: java.lang.ArrayIndexOutOfBoundsException: length=79; index=-19
       at android.text.DynamicLayout.getBlockIndex(DynamicLayout.java:646)
       at android.widget.Editor.drawHardwareAccelerated(Editor.java:1744)
       at android.widget.Editor.onDraw(Editor.java:1713)
       at android.widget.TextView.onDraw(TextView.java:7034)
       at android.view.View.draw(View.java:19314)
       at android.view.View.updateDisplayListIfDirty(View.java:18250)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.draw(View.java:19317)
       at android.widget.ScrollView.draw(ScrollView.java:1777)
       at android.view.View.updateDisplayListIfDirty(View.java:18250)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.draw(View.java:19317)
       at android.view.View.updateDisplayListIfDirty(View.java:18250)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.draw(View.java:19317)
       at android.support.v4.view.ViewPager.draw(ViewPager.java:2420)
       at android.view.View.updateDisplayListIfDirty(View.java:18250)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.updateDisplayListIfDirty(View.java:18241)
       at android.view.View.draw(View.java:19042)
       at android.view.ViewGroup.drawChild(ViewGroup.java:4271)
       at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4054)
       at android.view.View.draw(View.java:19317)
       at com.android.internal.policy.DecorView.draw(DecorView.java:915)
       at android.view.View.updateDisplayListIfDirty(View.java:18250)
       at android.view.ThreadedRenderer.updateViewTreeDisplayList(ThreadedRenderer.java:684)
       at android.view.ThreadedRenderer.updateRootDisplayList(ThreadedRenderer.java:690)
       at android.view.ThreadedRenderer.draw(ThreadedRenderer.java:804)
       at android.view.ViewRootImpl.draw(ViewRootImpl.java:3199)
       at android.view.ViewRootImpl.performDraw(ViewRootImpl.java:2997)
       at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2526)
       at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1515)
       at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:7266)
       at android.view.Choreographer$CallbackRecord.run(Choreographer.java:981)
       at android.view.Choreographer.doCallbacks(Choreographer.java:790)
       at android.view.Choreographer.doFrame(Choreographer.java:721)
       at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:967)
       at android.os.Handler.handleCallback(Handler.java:808)
       at android.os.Handler.dispatchMessage(Handler.java:101)
       at android.os.Looper.loop(Looper.java:166)
       at android.app.ActivityThread.main(ActivityThread.java:7529)
       at java.lang.reflect.Method.invoke(Method.java)
       at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:245)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:921)

It seems likely that this is related to #8827 – both are platform-specific Android 8 issues.

reference: 5a76a4fa8cb3c2fa634286c6-fabric reference: 5af4233111e9fa0aa513dfa0-fabric marked as high priority as it's happening ~35 times/day and affecting ~800 users

jkmassel commented 5 years ago

Possibly related to https://github.com/wordpress-mobile/AztecEditor-Android/issues/729 and https://github.com/wordpress-mobile/AztecEditor-Android/issues/516

SiobhyB commented 5 years ago

There's a report in 1812801-zen that seems to match up with this crash report in Fabric.

The user's device is a Samsung SM-J400M and it's running on Android 8.0.

The app crashes when they attempt to use voice dictation after uploading more than three photos at a time.

@jkmassel: Is there any extra info from the user that'd be useful for troubleshooting this further?

charliescheer commented 5 years ago

Another potential report in 1861204-zen. We had the user update to the most recent version and the problem persists.

Android device name: HUAWEI GRACE "Almost always when I want to open my blog posts, the app crashes."

designsimply commented 5 years ago
reference: 5a76a4fa8cb3c2fa634286c6-fabric reference: 5af4233111e9fa0aa513dfa0-fabric marked as high priority as it's happening ~35 times/day and affecting ~800 users

Combining stats for the last 30 days for 5a76a4fa8cb3c2fa634286c6-fabric and 5af4233111e9fa0aa513dfa0-fabric:

Crashed ~90 times a day or 2844 times for 809 users in the past 30 days Feb 14 - Mar 14, 2019.

Crashes are trending up for 11.9 in 5af4233111e9fa0aa513dfa0-fabric.

malinajirka commented 5 years ago

I can confirm it's related to wordpress-mobile/AztecEditor-Android#729 and wordpress-mobile/AztecEditor-Android#516

Unfortunately, it's a bug in the OS which was introduced in Android 8.0 and was fixed in Android 9.0 (https://android-review.googlesource.com/c/platform/frameworks/base/+/634929).

https://github.com/wordpress-mobile/AztecEditor-Android/issues/516 contains steps how to reproduce the issue.

All these crashes seems related - it also seems the crash is not specific to Huawei phones. It's specific to Android 8.0 and 8.1. 5a86f7f48cb3c2fa632d5149-fabric 5af4233111e9fa0aa513dfa0-fabric 5a76a4fa8cb3c2fa634286c6-fabric 5a04a91461b02d480d160132-fabric 5ac6aba236c7b23527a48419-fabric 5b9ef56df8b88c2963244ca6-fabric 5ae8d77a638393737aeae584-fabric 5ae6655f638393737ab63fe7-fabric

If we sum up all these crashes, it's definitely our #1. I can think of 3 options we have, but none of them is great.

Wdyt @daniloercoli ?

rfaile313 commented 5 years ago

from 1861204-zen

"It's not when trying to open a specific post, but instead when trying to view the list of posts"

daniloercoli commented 5 years ago

It seems the issue is not only happening when opening the editor @malinajirka. Also when accessing the posts lists, accordingly to the report ^ above.

jkmassel commented 5 years ago

Are we able to detect the current Android version in code, and use a different path for that?

I think either of those two solutions could be applied judiciously, perhaps as a setting? I'd imagine we could detect if a user crashed in the editor on Android 8, and prompt them on next launch to enable "compatibility mode" or whatever we want to call it that fixes the issue in v8. Then, if they later upgrade to v9, we silently disable it?

Typically for a simple bug we might not go to all this trouble, but it seems quite widespread.

WDYT @malinajirka / @daniloercoli ?

malinajirka commented 5 years ago

"It's not when trying to open a specific post, but instead when trying to view the list of posts" It seems the issue is not only happening when opening the editor @malinajirka. Also when accessing the posts lists, accordingly to the report ^ above.

I've checked the ticket and I haven't found any indication it's the same crash. I might be missing something. cc @daniloercoli

Are we able to detect the current Android version in code, and use a different path for that?

Yes.

I think either of those two solutions could be applied judiciously, perhaps as a setting? I'd imagine we could detect if a user crashed in the editor on Android 8, and prompt them on next launch to enable "compatibility mode" or whatever we want to call it that fixes the issue in v8. Then, if they later upgrade to v9, we silently disable it?

I'll try to measure the performance impact if we turn off the hardware acceleration - it's definitely the easiest and the only not-so-hacky solution.

The option 2 should be considered as a last resort solution which might not be even worth it. Especially since we don't have feature flags and we won't be able to disable the "fix" if something goes wrong - which can easily happen on custom roms. We'll definitely need to consult it with other Android devs to consider the risks.

I'm not sure we can somehow detect the app crashed.

However, we know it'll crash under certain circumstances on Android 8. So if we come up with a solution, we can just enable it by default on Android 8.

daniloercoli commented 5 years ago

We could register our own UncaughtExceptionHandler, but I'm a bit worried we'd override Fabric's handler. I'd need to look into this - we might be able to call the Fabric's handler within our handler.

Yes I think this is a good idea. Take a look at AztecExceptionHandler since we're already doing "things" on crashes on Aztec 😬

I'll try to measure the performance impact if we turn off the hardware acceleration - it's definitely the easiest and the only not-so-hacky solution.

I guess we can turn off the hardware acceleration when we detect the crash happened on Android 8. Just store a property, and disable the HW acceleration at next startup when the prop is true and Android == 8. I'm not sure about prompt the user about it, since it will be hard to explain where to go to enable the Compatibility mode, and why it's required. Maybe we can show a prompt "Compatibility mode enabled", explain why in the message, and give some info about how to turn it off in App settings. Where we should add a toggle for it.

jkmassel commented 5 years ago

Extension on the above idea – given that we know how the bug is triggered, is it possible to examine the contents of a post to determine whether or not the crash will happen when the post is placed in the editor? That'd allow us to avoid that initial crash by preemptively putting Aztec in compatibility mode.

WDYT?

malinajirka commented 5 years ago

@jkmassel I've looked into it and I wasn't able to come up with a solution. It's definitely a good idea and I'm not saying it's not possible to somehow do it. The code which contains the bug is just really complicated + it's buried in the Android OS so we don't have access to the data.

I spent a bit more time exploring the options I mentioned earlier

Disable HW acceleration

IMHO disabling hw acceleration is not worth it. The bug occurs just on certain posts under specific circumstances and even if we disable it just for users who are experiencing the issue, the performance hit will affect all their posts.

Copy-paste method Unfortunately, this method won't work either :(. The classes we'd need to copy are dependant on many internal classes which are not accessible to us from the code (we'd need to copy them as well).

Even though I don't like it, I give up for now. Maybe I'll come up with some more ideas how to work around it. I still hope this bug is not present in Gutenberg and it'll become less critical in the near future.

designsimply commented 5 years ago

30-day impact: ~52 times a day Devices: Samsung Android 8, HUAWEI Android 8 Last seen in: 12.1.1 (5a76a4fa8cb3c2fa634286c6-fabric-android and 5af4233111e9fa0aa513dfa0-fabric-android)

mzorz commented 5 years ago

Investigated this one a bit more to gain context, unfortunately it seems https://github.com/wordpress-mobile/AztecEditor-Android/pull/801 only covered one of the ways in which the crash can happen so, the trend is still the same (crashes continue to exist). I'm trying some other alternative approaches (recovering Aztec when the crash happens, basically) just to let users continue. Will update if I find a way to workaround this, but given my approach are workarounds rather than tackling the root issue (which was already extensively investigated and was fixed in later versions of Android), I'm unassigning myself from this one here.

mkevins commented 5 years ago

I have found a way to reliably reproduce this in the AztecDemo app, running on an Android emulator with API 27 (Android 8.1) emulating a Pixel 2.

Exact steps to reproduce:

  1. Load the demo app and long-press the top image.
  2. Copy the image.
  3. Tap to the left of the image to place the caret there.
  4. Tap the caret handle, and select paste from the context menu (there should now be two images).
  5. Tap to the left of the top image to place the caret there.
  6. Press [Enter] on the screen keyboard.
  7. Tap the empty space above the image to place the caret there.
  8. Type the word "test", followed by [Enter].
  9. Long-press the word "test" to select it.
  10. Touch and drag the right selection handle down and to the left of the top image (the selection should now include both lines above the image, but not the image itself).
  11. Press [Backspace] on the screen keyboard. 💥

Screencast:

I think it is possible that cases with multiple images are still encountering this issue because the workaround here https://github.com/wordpress-mobile/AztecEditor-Android/pull/801 is careful to apply the InputFilter only when specific criteria are met. I believe that in the case where multiple sequential images are present, and / or content is deleted (i.e. dstart != dend), reflow can still result in a code path that encounters the underlying Android OS bug.

mzorz commented 5 years ago

I think it is possible that cases with multiple images are still encountering this issue because the workaround here wordpress-mobile/AztecEditor-Android#801 is careful to apply the InputFilter only when specific criteria are met. I believe that in the case where multiple sequential images are present, and / or content is deleted (i.e. dstart != dend), reflow can still result in a code path that encounters the underlying Android OS bug.

I tend to think the same! We can try narrowing this case down and adding it as a covered condition here as well 👍 - nice work @mkevins !

mkevins commented 5 years ago

Thanks @mzorz! I've created a draft PR for this, which I think is in decent shape. It could certainly use some extra scrutiny, given the nature of this particular bug. Thanks for all your helpful ideas!

sentry-io[bot] commented 5 years ago

Sentry issue: WORDPRESS-ANDROID-10

3-day impact: ~22 per day (since moving to Sentry) Users affected: 15 Last seen in: 12.3.1

designsimply commented 4 years ago

@mkevins it looks like the changes in https://github.com/wordpress-mobile/WordPress-Android/pull/9773 got added when we updated to Aztec v1.3.26, is that correct? May I asked an update of what needs to happen next for this issue and if you are still working on it? (I noticed you are still an assignee.)

mkevins commented 4 years ago

@mkevins it looks like the changes in #9773 got added when we updated to Aztec v1.3.26, is that correct? May I asked an update of what needs to happen next for this issue and if you are still working on it (you are still an assignee)?

Hi @designsimply :wave: That is correct, it was brought in with the update to v1.3.26, but not in PR #9773, but rather, it seems, PR #9788, which I believe corresponds with version 12.4. I believe we are still seeing reports of a similar crash in Sentry in 12.4, which suggests the most recent workaround may not be catching some cases.

I'm wondering if there is a clean way to see the impact the latest workaround has had on the numbers (though I realize this may be difficult with the switch from Fabric to Sentry). Also, it would help to be able to reproduce the crash after the latest workaround, but I haven't found steps to do that. I wonder if something like UI/Application Exerciser Monkey could help in some way to automate the search for steps. I suspect there may be many paths to this underlying Android OS bug, but maybe a stochastic tool like that can illuminate some common paths. cc @mzorz any thoughts?

mzorz commented 4 years ago

something like UI/Application Exerciser Monkey could help in some way to automate the search for steps.

I think at this point trying this is totally worth it! I recall using this same technique back when programming for PalmOS, we can certainly do that here and see what we can find.

designsimply commented 4 years ago

90-day impact: ~84 per day Users affected in the last 90 days: 2.3k Last seen in: 12.6 Limited to: Android 8

https://sentry.io/share/issue/9f844f6ec25f410481e73c996cc6410f/

mkevins commented 4 years ago

Unassigning myself from this for now, as I'll be focused on other work in the near future. For anyone who may pick this up, my rough idea of next steps is to see if UI/Application Exerciser Monkey could reproduce the crash "route(s)". I've left this unexplored, and had only considered this approach as a last resort, conditioned on a relatively low time-cost in configuring that tool.

designsimply commented 4 years ago

Found some similar issues in Sentry and merged them and here is a new count after the merge:

90-day impact: ~133 per day Users affected in the last 90 days: 3.3k Last seen in: 12.9 Limited to: Android 8

https://sentry.io/share/issue/9f844f6ec25f410481e73c996cc6410f/

designsimply commented 4 years ago

@mzorz @mkevins would it be possible to aggressively detect folks using Aztec on the affected versions and push them to Gutenberg? Because, for this case, it feels like switching to the new editor would be a better experience than not being able to post because of a platform bug.

mzorz commented 4 years ago

I think that's a definitely a possibility @designsimply - would defer the call to @hypest @daniloercoli at this point, I think if my memory doesn't fail much, @daniloercoli came up with that same approach to workaround this or other issues before, but it was kind of difficult to make the switch back then, maybe it's easier now that Gutenberg is more developed on mobile

designsimply commented 4 years ago

Thanks! I will review this again with @daniloercoli at the Oct 14-18 Groundskeeping rotation.

mkevins commented 4 years ago

for this case, it feels like switching to the new editor would be a better experience than not being able to post

I totally agree, and this sounds like a very sensible way to mitigate the issue.

daniloercoli commented 4 years ago

Hey @designsimply - looking at https://sentry.io/share/issue/9f844f6ec25f410481e73c996cc6410f/ it seems that the problem is not limited to Huawei devices, but on devices running Android 8.X. Would be good to update the description of the issue with these new details.

Unfortunately i don't see a reliable way to fix this issue once and for all, since the problem is deep inside the Android platform in version 8.X. It got fixed in v9. We already tried in a PR mentioned above in comments to go case by case, but it's hard to reproduce this issue and we may not catch all of those anyway.

would it be possible to aggressively detect folks using Aztec on the affected versions and push them to Gutenberg? Because, for this case, it feels like switching to the new editor would be a better experience than not being able to post because of a platform bug.

We're already pushing users to the new Gutenberg mobile editor (since late August), by enabling it when a post with blocks is opened in the app. So GB mobile should be used more than before and this crash should go down shortly. Is there a way in Sentry to check if the occurrences of this crash are going down since then? I can only check 30 days back, and it seems we're doing better since September.

Possible solutions:

I propose to just keep an eye on this crash without doing anything special at the moment:

If you think that is better to propose a fix we can go down by disabling HW acceleration on a per-post basis. I'm open to other suggestions though.

designsimply commented 4 years ago

Would be good to update the description of the issue with these new details.

I updated the first sentence to reflect the issue is not limited to only Huawei devices but is limited to devices running Android 8.x (and we might have started seeing the other devices when I merged Sentry issues on 2019-08-15).

Is there a way in Sentry to check if the occurrences of this crash are going down since then? I can only check 30 days back, and it seems we're doing better since September.

Sentry should show 90 days back overall—the graph at right only shows 30, the numbers at the top and overall crashes recorded are 90.

Looking at the numbers again, it looks like there are more crashes recorded for less people compared to the last time I checked.

90-day impact: ~200 per day Users affected in the last 90 days: 2400 Limited to: Android 8.x

https://sentry.io/share/issue/9f844f6ec25f410481e73c996cc6410f/

I agree the first solution you proposed is the best route if we don't see the numbers go down

Detect the crash and disable HW acceleration only on that post.

I also checked the Statistics section of the Google Play Console for WPAndroid and found that on 2019-10-13 Android 8.0 are down 8.7% compared to the same day one month prior 2019-09-13 which is a check to show Android 8.0 usage for our app is indeed trending down (as expected and as you noted), and we should review the overall percentage of Android 8 users on our app (is that number ok to post publicly?) and use that to help determine whether to implement the solution to disable HW acceleration only on posts affected by this crash.

designsimply commented 4 years ago

90-day impact: ~211 per day Users affected in the last 90 days: 2200 Limited to: Android 8.0.0 and 8.1.0

https://sentry.io/share/issue/9f844f6ec25f410481e73c996cc6410f/

designsimply commented 4 years ago

Consider closing with https://github.com/wordpress-mobile/WordPress-Android/pull/10620 as the accepted solution.

malinajirka commented 4 years ago

Fixed in https://github.com/wordpress-mobile/WordPress-Android/pull/10620. See the PR for details -> Unfortunately this PR does act after the fact, and probably we won't see crash report numbers disappear completely..

sentry-io[bot] commented 2 months ago

Sentry Issue: WORDPRESS-ANDROID-2D14