wordpress-mobile / AztecEditor-Android

A reusable native Android rich text editor component.
Mozilla Public License 2.0
675 stars 112 forks source link

Caused by java.lang.NumberFormatException in CleaningUtils #805

Open jtreanor opened 5 years ago

jtreanor commented 5 years ago

This crash first appeared in 12.1-rc-3 and appears to be increasing in frequency:

Caused by java.lang.NumberFormatException: For input string: "000C6
AMP=00026
Aacute=000C1
Acirc=000C2
Agrave=000C0
Aring=000C5
Atilde=000C3
Auml=000C4
COPY=000A9
Ccedil=000C7
ETH=000D0
Eacute=000C9
Ecirc=000CA
Egrave=000C8
Euml=000CB
GT=0003E
Iacute=000CD
Icirc=000CE
Igrave=000CC
Iuml=000CF
LT=0003C
Ntilde=000D1
Oacute=000D3
Ocirc=000D4
Ograve=000D2
Oslash=000D8
Otilde=000D5
Ouml=000D6
QUOT=00022
REG=000AE
THORN=000DE
Uacute=000DA
Ucirc=000DB
Ugrave=000D9
Uuml=000DC
Yacute=000DD
aacute=000E1
acirc=000E2
acute=000B4
aelig=000E6
agrave=000E0
amp=00026
aring=000E5
atilde=000E3
auml=000E4
brvbar=000A6
ccedil=000E7
cedil=000B8
cent=000A2
copy=000A9
curren=000A4
deg=000B0
divide=000F7
eacute=000E9
ecirc=000EA
egrave=000E8
eth=000F0
euml=000EB
frac12=000BD
frac14=000BC
frac34=000BE
gt=0003E
iacute=000ED
icirc=000EE
iexcl=000A1
igrave=000EC
iquest=000BF
iuml=000EF
laquo=000AB
lt=0003C
macr=000AF
micro=000B5
middot=000B7
nbsp=000A0
not=000AC
ntilde=000F1
oacute=000F3
ocirc=000F4
ograve=000F2
ordf=000AA
ordm=000BA
oslash=000F8
otilde=000F5
ouml=000F6
para=000B6
plusmn=000B1
pound=000A3
quot=00022
raquo=000BB
reg=000AE
sect=000A7
shy=000AD
sup1=000B9
sup2=000B2
sup3=000B3
szlig=000DF
thorn=000FE
times=000D7
uacute=000FA
ucirc=000FB
ugrave=000F9
uml=000A8
uuml=000FC
yacute=000FD
yen=000A5
yuml=000FF
"
       at java.lang.Integer.parseInt(Integer.java:521)
       at org.jsoup.nodes.Entities.load(Entities.java:314)
       at org.jsoup.nodes.Entities.access$000(Entities.java:25)
       at org.jsoup.nodes.Entities$EscapeMode.(Entities.java:53)
       at org.jsoup.nodes.Entities$EscapeMode.(Entities.java:38)
       at org.jsoup.nodes.Document$OutputSettings.(Document.java:372)
       at org.jsoup.nodes.Document.(Document.java:19)
       at org.jsoup.parser.TreeBuilder.initialiseParse(TreeBuilder.java:32)
       at org.jsoup.parser.XmlTreeBuilder.initialiseParse(XmlTreeBuilder.java:27)
       at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:42)
       at org.jsoup.parser.Parser.parseInput(Parser.java:32)
       at org.jsoup.Jsoup.parse(Jsoup.java:45)
       at org.wordpress.aztec.util.CleaningUtils.cleanNestedBoldTags(CleaningUtils.java:24)
       at org.wordpress.aztec.Html.fromHtml(Html.java:193)
       at org.wordpress.aztec.AztecParser.parseHtmlForInspection(AztecParser.java:69)
       at org.wordpress.android.editor.AztecEditorFragment.parseContent(AztecEditorFragment.java:2179)
       at org.wordpress.android.editor.AztecEditorFragment.getMediaMarkedAsClassInPostContent(AztecEditorFragment.java:2158)
       at org.wordpress.android.editor.AztecEditorFragment.getMediaMarkedUploadingInPostContent(AztecEditorFragment.java:2146)
       at org.wordpress.android.ui.posts.EditPostActivity.initializePostObject(EditPostActivity.java:600)
       at org.wordpress.android.ui.posts.EditPostActivity.onCreate(EditPostActivity.java:456)
       at android.app.Activity.performCreate(Activity.java:6682)
       at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1118)
       at androidx.test.runner.MonitoringInstrumentation.callActivityOnCreate(MonitoringInstrumentation.java:174)
       at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2619)
       at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2727)
       at android.app.ActivityThread.-wrap12(ActivityThread.java)
       at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1478)
       at android.os.Handler.dispatchMessage(Handler.java:102)
       at androidx.test.espresso.base.Interrogator.a(Interrogator.java:19)
       at androidx.test.espresso.base.UiControllerImpl.a(UiControllerImpl.java:166)
       at androidx.test.espresso.base.UiControllerImpl.a(UiControllerImpl.java:158)
       at androidx.test.espresso.base.UiControllerImpl.a(UiControllerImpl.java:34)
       at androidx.test.espresso.action.MotionEvents.a(MotionEvents.java:77)
       at androidx.test.espresso.action.MotionEvents.a(MotionEvents.java:52)
       at androidx.test.espresso.action.Tap.c(Tap.java:8)
       at androidx.test.espresso.action.Tap.b(Tap.java:18)
       at androidx.test.espresso.action.Tap$1.a(Tap.java:3)
       at androidx.test.espresso.action.GeneralClickAction.perform(GeneralClickAction.java:22)
       at androidx.test.espresso.ViewInteraction$SingleExecutionViewAction.perform(ViewInteraction.java:9)
       at androidx.test.espresso.ViewInteraction.a(ViewInteraction.java:79)
       at androidx.test.espresso.ViewInteraction.a(ViewInteraction.java:96)
       at androidx.test.espresso.ViewInteraction$1.call(ViewInteraction.java:3)
       at java.util.concurrent.FutureTask.run(FutureTask.java:237)
       at android.os.Handler.handleCallback(Handler.java:751)
       at android.os.Handler.dispatchMessage(Handler.java:95)
       at android.os.Looper.loop(Looper.java:154)
       at android.app.ActivityThread.main(ActivityThread.java:6121)
       at java.lang.reflect.Method.invoke(Method.java)
       at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:889)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:779)

5ca33cb5f8b88c296305cf94-fabric

mzorz commented 5 years ago

I've been investigating this one trying to reproduce with the logs we have on current develop

The logs say

Tracked: post_list_item_selected, Properties: {"post_id":34,"has_gutenberg_blocks":false,"blog_id":xxxxx,"action":"edit","is_jetpack":false}

So, when opening the post and checking for id 34 I can see this content:

text
<p style="text-align:right;">

<!--more-->

<hr>

I pasted it on my own site and I cannot make it crash as apparently there's nothing weird in there - one thing we may be missing is for example, there may be a possibility that they're opening a locally modified copy of Post with ID==34, but the data we're seeing online is not the same as what they have locally on their device.

But looking further, even totally empty posts (with no revision history) showed the failure.

Further investigation make me think it's an issue internal to Jsoup, given at the line the crash happens it is only being initialized (without being passed any content yet). Apparently the implementation had some changes in https://github.com/jhy/jsoup/commit/f3035cafbbd8830de75f641920f8e5dbf0070f7c#diff-2ae1d9d6a9f9ee450c8a12116dc76658 which is a later version of Jsoup, and it has coincidentally been updated in Aztec as per this commit https://github.com/wordpress-mobile/AztecEditor-Android/commit/c0ed37e3d3a560a1aedc276efa558ff5ebda1acb#diff-c197962302397baf3a4cc36463dce5ea right after the crash started to appear in our app so, I suggest to wait and see if the problem continues to appear in the next WPAndroid beta and alpha.

daniloercoli commented 5 years ago

I spent some time over the weekend trying to reproduce the problem with no luck. I tried with an old version of Aztec, and the newest one, with text that contains lot of HTML ents, but was not able to reproduce the crash. Would be good to have an input that makes Jsoup.parse(html, "", Parser.xmlParser()).outputSettings(Document.OutputSettings().prettyPrint(false)) crashing.

malinajirka commented 5 years ago

Thanks for looking into it @mzorz and @daniloercoli ;).

It seems jsoup version 1.11.3 has been released in April last year and since it's a widely used library I believe they'd have known about an internal issue by now. Wdyt?

Can't it be related to the recent fix of the #1 crash. There are two things which seems suspicious

Wdyt?

daniloercoli commented 5 years ago

It seems jsoup version 1.11.3 has been released in April last year and since it's a widely used library I believe they'd have known about an internal issue by now. Wdyt?

Sure, I suspect the problem is in the following part of the code Parser.xmlParser(), since we're parsing an HTML document by using an XML parser instead. Supported ents in XML should be a subset of those available in HTML.

jtreanor commented 5 years ago

Thanks, @malinajirka!

Yes, that's right, this first appeared in 12.1-rc-3 which included these changes (the crash fix was one of them):

mzorz commented 5 years ago

Definitely I didn't mean the problem would have to also be widely spread and thus noticed elsewhere other than WPAndroid, but rather that it was "contained" within the jsoup parser and also that it is specific to the nested b tags cleaner. If we don't find a solution quickly (for which we could first try to log IMO the local contents when the crash happens and see further there if it's related to the nbr 1 crash fix), we can try with Danilo's suggestion to use an htmlparser and furthermore we can just catch the exception and move on, as what that method really does is just try to make things easier for Aztec when it's being fed a post with lots of b nested tag. Thoughts @daniloercoli @malinajirka @jtreanor

jtreanor commented 5 years ago

Thankfully, the crash rate has not grown over the weekend. As it stands I would be happy to cautiously push forward with the 12.1 release, while watching very carefully that this doesn't get out of control.

for which we could first try to log IMO the local contents when the crash happens and see further there if it's related to the nbr 1 crash fix

Would it be possible to do this prior to the release today? Hopefully, that would allow us to fix it more quickly.

mzorz commented 5 years ago

Sure thing let's try adding the logger to gather info :+1:

daniloercoli commented 5 years ago

Logging the content of the editor is not possible for privacy reasons without additional checks that happen when the editor is fully initialized (We're logging it only for public sites and for public posts). The changes required to make the logging correctly happen in the utils routines (that are called before the editor is up and running) are not that simple to implement them right before the cut.

We can add a try/catch statement and let the app working, and maybe add a proper logging later. What do you think about this? If we add the try/catch statement we will eat the crash though, and not able to check the trend later.
What I propose is:

jtreanor commented 5 years ago

Thanks, @daniloercoli and @mzorz.

I would prefer not to catch and hide the crash. Without I fix, I think it is best to let the crash happen.

Let's watch the trend and implement the logging in an upcoming release?

mzorz commented 5 years ago

Ok sounds good @jtreanor! Thank you for looking into this @daniloercoli

malinajirka commented 5 years ago

Thank you all for looking into it! ;)

Just a quick note: What about logging the exception in the catch block with CrashlyticsUtils.logException? We could still see the trend and the app wouldn't crash. Wdyt?

mzorz commented 5 years ago

Just a quick note: What about logging the exception in the catch block with CrashlyticsUtils.logException? We could still see the trend and the app wouldn't crash. Wdyt?

Sounds good! Would you like to submit a PR with that, given the other PR has already been merged?

malinajirka commented 5 years ago

I have been quite busy lately. I will be happy to do it, but it might take a while before I get to it. I want to fully understand the context and make sure we actually end up in a valid state if we just consume the exception (and log it). (if anyone else has time and wants to do it, please feel free).

mzorz commented 5 years ago

Don't worry then! I feel at this point we should only gather more info (logging) rather than modifying behavior, for the same reasons you outlined. :+1:

jtreanor commented 5 years ago

Thanks everyone for the input on this.

12.1 has been rolling out gradually since Monday and the good news is that this has not been a widespread issue as of yet.

The additional logging will help us learn more as this happens again but I don't think we need to do anything else urgently here.

MoustafaElsaghier commented 3 years ago

any updates about that?

Anilkumar18 commented 2 years ago

Any updates about this issue? We have faced this issue in the 1.13.1 version too.