Closed sentry-io[bot] closed 3 years ago
30-day impact: ~31 per day Users affected: 449 in the last 30d Last seen in: 12.4 (latest official release at the time of this comment)
https://sentry.io/share/issue/c467f624ddc244c593cf6442ae7d3cc2/
~Approx. number of crashes went from ~10/day to ~1 or 2/day on 2019-07-31 when 12.9 was released. Recommend closing after a check from a Groundskeeper to confirm.~ Ignore this note, there were a lot more reports but they hadn't yet been combined into one Sentry issue.
I think all these are related: https://sentry.io/organizations/a8c/issues/1016907463/ https://sentry.io/organizations/a8c/issues/1135428453/ https://sentry.io/organizations/a8c/issues/1015884491/ https://sentry.io/organizations/a8c/issues/1127609759/ https://sentry.io/organizations/a8c/issues/1134058068/ https://sentry.io/organizations/a8c/issues/1020752033/ https://sentry.io/organizations/a8c/issues/1136577321/ https://sentry.io/organizations/a8c/issues/1136758717/ https://sentry.io/organizations/a8c/issues/1136981428/ https://sentry.io/organizations/a8c/issues/1135419520/ https://sentry.io/organizations/a8c/issues/1136831605/ https://sentry.io/organizations/a8c/issues/1015751008/ https://sentry.io/organizations/a8c/issues/1019067932/ https://sentry.io/organizations/a8c/issues/1145277525/
It seems all the crashes are happening in Gutenberg. I haven't counted number of crashes per day but it's definitely not an infrequent crash.
@mkevins Would you mind taking a look at this crash? I'd need to setup up gutenberg environment and I'd probably spent a couple of days trying to understand how to project works:P. Thank you 🙇
@malinajirka certainly! :+1: I'll look into it.
There are a few IndexOutOfBoundsException
-related crashes with various root causes, however, this one looks familiar. What these all seem to have in common is that they occur during a call from ReactAztecManager
to the underlying EditText
methods (notably, setSpan
on android.text.SpannableStringBuilder
).
There is a comment here that may be relevant: https://github.com/WordPress/gutenberg/blob/master/packages/rich-text/src/component/index.native.js#L800 as it describes a workaround used in Gutenberg code to avoid setting the selection out of bounds. It works by anticipating AztecText
's mutation of the text buffer, in it's "clean-up" of html. When it detects that spaces will be trimmed, it safely avoids setting the selection.
Another potential clue on this crash is that many of the exceptions are "off-by-one". I have been using the following jq
script to parse the sentry json data, to try to find a pattern:
#!/bin/bash
jq '
(
.exception.values[]
| select( .type | contains("IndexOutOfBoundsException") )
| {
value,
offBy: ( [.value | capture("(?<n>\\d+)"; "g")]
| map(.n|tonumber)
| .[2]-.[1]
),
modules: (.stacktrace.frames[-7:]
#| map(.function)
#| map(.abs_path)
| map(.module)
#| map(.in_app)
#| map(.lineno)
#| map(.filename)
#|join(",")
)
}
) +
( { breadcrumbs:
[
.breadcrumbs.values[]
#| select( .message | contains("WordPress-EDITOR") )
| select( .message | contains("will be trimmed for spaces") )
#| select( .message | contains("Processed HTML") )
| .message
][-1]
} )
'
Via cat sentry-file-*|./parse-9832
.
8 of the 14 cases listed have an IndexOutOfBoundsException caused by the index being off by 1:
I noticed that in some cases, the modules listed in the stacktrace leading up to android.text.SpannableStringBuilder
were not always the same. In some cases, android.widget.TextView
showed up, and in other cases, not. This may be a clue to what code paths are being touched, however, it did not appear to correlate to the differences "off-by-one" cases.
Also, it is a bit puzzling to see that this crash sometimes occurs, even when the breadcrumbs show that willTrimSpaces
returned true:
I also noticed that some, but not all of the cases had a breadcrumb with a message containing Processed HTML
, which is logged from within the pasteHandler
. Unfortunately, there was no obvious pattern or correlation to the "off-by-one" cases. As a next step, I will use the underlying content to try to reproduce the crash on the demo app. Perhaps there is something particular about the content itself, or the way it is parsed by Aztec that is leading to this exception.
I merged all the issues to one in Sentry. Here are the updated numbers for reference.
90-day impact: ~111 per day Users affected in the last 90d: 3.2k Last seen in: 12.9
https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
90-day impact: ~200 per day Users affected in the last 90 days: 5200
https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
👋 @mkevins , let me know if you're still working on this ticket, thanks!
Hi @hypest :wave:
I tried this next step, but was unable to reproduce using a few samples of content (from the crash reports):
As a next step, I will use the underlying content to try to reproduce the crash on the demo app.
Since then, I have unassigned myself.
90-day impact: ~333 per day Users affected in the last 90 days: 8900
https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
Looks like a widespread crash happening for a lot of users. Let's raise the priority on this one.
My first attempt on fixing this issue was based on the following idea:
updateSelectionIfNeeded
methodI've made this branch here: https://github.com/wordpress-mobile/WordPress-Android/tree/issue/9832-Aztec-crash-Span-Out-of-bounds and the GB-mobile counter part here: https://github.com/wordpress-mobile/gutenberg-mobile/tree/issue/wp-android-9832-crash-Span-Out-of-bounds
The key point is calling the setNativeModuleCallExceptionHandler
method of ReactInstanceManagerBuilder
, and introduce a special version of the DefaultNativeModuleCallExceptionHandler
used in non-debug mode.
Unfortunately when the error does happen, the ReactInstanceManager does properly call the custom handler and log the error in Sentry, but the ReactInstance does remain in a broken state, and the current block is froze. The UI is not responding anymore to the taps...
I'm not going to update this proposed solution above, since it's already a bit too convoluted, and not working fine. I think we should find another way to give to the Aztec Wrapper the ability to log exception in the main app, without crashing it. This feature is already available in Aztec via the "external logger." So that Aztec doesn't know the actual implementation of the logger.
Reporting here a suggestion by @mzorz
I would it be possible to make a Sentry ReactNative wrapper component (like we have our Aztec ReactNative wrapper) and so the exception caught from Aztec RN can be sent to this other component? Just a thought, do you think this could work?
I've finally found a way to replicate the problem, and it seems it's a synch problem between Aztec internals and the content
+ selection
sent by the JS side.
As we can see from the Sentry logs the problem happens when Aztec tries to set the selection over the new HTML sent by the JS side. See the log below:
at org.wordpress.mobile.ReactNativeAztec.ReactAztecManager.updateSelectionIfNeeded(ReactAztecManager.java:197)
at org.wordpress.mobile.ReactNativeAztec.ReactAztecManager.setTextfromJS(ReactAztecManager.java:190)
at org.wordpress.mobile.ReactNativeAztec.ReactAztecManager.setText(ReactAztecManager.java:
Most probably this is what happens:
Note: Since there are performance optimizations in AztecWrapper the issue is not happening when the user types in the editor, but it does happen on merging or splitting of blocks.
Steps to replicate the issue:
Plus
symbol to add a new block below the paraThe crash does happen because Aztec removed the last br
tag from the passed input HTML, and the selection indexes are out of bounds.
I've a PR already prepared that should fix the problem. Will open it shortly.
cc @hypest
We landed a fix and it will be included in v1.17 of the editor, destined for the v13.7 of WPAndroid (Beta release on November 18th, general release on December 2nd 2019). I'll close this ticket now but we can open again if we still encounter the issue on v13.7 and beyond.
I was checking most frequent Sentry issues and found that the number of events has not decreased after 13.7 rollout. Re-opening.
I haven't come up with new steps to recreate the crash since Danilo solved one way to yield this crash, detailed here: https://github.com/wordpress-mobile/WordPress-Android/issues/9832#issuecomment-545564526.
However as part of the effort of looking for steps, I picked up this ticket https://github.com/wordpress-mobile/gutenberg-mobile/issues/1543 for adding some unit tests around the previous fix. Seems that unit tests on the new code from that fix won't help me re-create this issue alone, but it seems like a step in the right direction for figuring this out. I'm hoping it will lead to insight into how else we can recreate this crash.
I created this PR for adding some tests around the previous fix, and am confident that we solved some instances of this crash, (the steps identified by Danilo).
I think two good ways to keep working on this issue are 1) to keep looking through crash breadcrumbs to try and identify a pattern, and 2) to follow code that is adjusting RichText
and keep trying to find a way to reproduce this crash.
Unfortunately I still have not found a way to reproduce, though I believe it should be possible since it is our most common crash. Here's a list of things I tried (unsuccessfully):
Events in the last 90 days: 93,000 Users affected in the last 90 days: 15,000 https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
Events in the last 90 days: 100,000 Users affected in the last 90 days: 16,000 Devices affected: looks to be mainly Samsung and Redmi (Xiaomi) devices https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
I'll take a look at this since I have a Redmi (Xiaomi) device, I'll start by trying to reproduce the crash.
So I had some time to test this issue reviewing the reports from Sentry.
Found three cases that make the editor crash, I replaced the content with placeholder texts.
[cp]Text wihtin a tag[/cp]
<em>Text within a tag</em>.
<mark class="hl_yellow">"My text”, </mark>other text.
Block snippet example
<!-- wp:paragraph -->
<p><mark>"My text”, </mark>other text.</p>
<!-- /wp:paragraph -->
Also, some of the crashes I saw were with languages other than English so maybe there's an issue with some characters.
The steps to reproduce with the cases from above:
Paragraph
block from the inserter menu.My knowledge of Aztec is pretty limited still but maybe someone else has more experience with it to know what could be happening?
1,656 events have been tracked for this crash in 14.6.1 since it was released 9d ago on Apr 22.
Events in the last 90 days: 127,000 Users affected in the last 90 days: 19,000 https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
@mchowning, can you put this one at the top of your attention list? Thanks!
3,236 events have been tracked for this crash in 14.7 since it was released 15d ago on May 4.
Events in the last 90d: 137,000 Users affected in the last 90d: 20,000 https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/
Update to add a graph and note that events per day looks to be steadily rising for this one:
@geriux can you help with a code review on this one? Nice find on the steps to reproduce btw!! Thank you for the PR @mchowning. 🙇
So far I only opened a related issue in the gb-mobile repository, but I'm hoping to open a PR today! So there's nothing to review (quite) yet @geriux . 🙂
With the 14.9 release, this issue should no longer be causing a crash for our users. We have not resolved the underlying issue though, we've just stopped it from causing a crash. Any time the crash would have occurred, we are now sending a non-crashing IllegalSelectionIndexException
like this to Sentry, so I would expect a lot of those exceptions to start showing up. This exception does not crash the app, and it includes more information about the state of the app when the issue arises. I'm hoping we can use that information to help us address the underlying issue.
Thank you @mchowning!
Noting that 14.9 was released on 2020-06-01 and this 90d graph will be a good one to watch over the next few weeks. Currently, in terms of crashes in that issue in Sentry, it has gone from over 500 events per day in May to only ~100 per day in the last few days.
I see that https://github.com/wordpress-mobile/gutenberg-mobile/issues/2348 was opened to track the non-crashing error, @mchowning should we close this issue in favor of the new one with a link back to here for reference or do we need to watch both for the next few weeks to see what falls out?
I see that wordpress-mobile/gutenberg-mobile#2348 was opened to track the non-crashing error, @mchowning should we close this issue in favor of the new one with a link back to here for reference or do we need to watch both for the next few weeks to see what falls out?
I don't think we need to keep this one open since there are no crashes in 14.9. I went ahead and added a comment linking to this issue in https://github.com/wordpress-mobile/gutenberg-mobile/issues/2348, so I'll go ahead and close this.
Sentry Issue:
https://sentry.io/share/issue/c467f624ddc244c593cf6442ae7d3cc2/https://sentry.io/share/issue/91a4511e98264a9b97a16db7b5486a36/ (broken link fixed 1/29/20)One way to reproduce the issue in comments below.