wordpress-mobile / WordPress-Android

WordPress for Android
http://android.wordpress.org
GNU General Public License v2.0
2.92k stars 1.3k forks source link

Crash: android.os.TransactionTooLargeException #9685

Closed loremattei closed 1 month ago

loremattei commented 5 years ago

This one started in 11.8 and has been becoming more prominent in 12.0 and 12.1. It's currently the most common crash in Fabric.

Unluckily, the crash logs doesn't say much about where it occurs, but Fabric suggests that it is often seen when transferring bitmaps between Activities, or when saving a large amount of state between Activity configuration changes.

We had other occurrences of it in the past: https://github.com/wordpress-mobile/WordPress-Android/issues/5456.

Also, this thread https://github.com/facebook/react-native/issues/19458 suggests that it may be related to React-Native. The trend we see in Fabric (started in 11.8, growth a bit in 11.9, growth a lot in 12.0 and 12.1) seems to be compatible with mobile Gutenberg deployment in the app.

android.os.BinderProxy.transactNative (BinderProxy.java)
android.os.Looper.loop (Looper.java:176)
android.app.ActivityThread.main (ActivityThread.java:6635)
java.lang.reflect.Method.invoke (Method.java)
com.android.internal.os.ZygoteInit.main (ZygoteInit.java:823)

5c93a36df8b88c2963f28894-fabric

designsimply commented 5 years ago

30-day impact: ~40 times a day Latest version affected: 12.2 5c93a36df8b88c2963f28894-fabric-android

The trend we see in Fabric (started in 11.8, growth a bit in 11.9, growth a lot in 12.0 and 12.1) seems to be compatible with mobile Gutenberg deployment in the app.

cc @koke to get input on priority since it was mentioned as possibly related to Gutenberg Mobile and also because it is currently the most common crash.

koke commented 5 years ago

I'm not super familiar with Android, but I looked at the logs for a bunch of crashes and the pattern I'm seeing is that it happens when the application is closing and it's trying to save its state. For instance:

11:18:30:394 (UTC) | D/CrashlyticsCore w/WordPress-UTILS: No valid URLs passed to URLFilteredWebViewClient! HTTP Links in the page are NOT disabled, and ALL URLs could be loaded by the user!! 11:18:30:689 (UTC) | D/CrashlyticsCore d/WordPress-NOTIFS: notifications pager > adapter saveState 11:27:17:484 (UTC) | D/CrashlyticsCore i/WordPress-UTILS: App goes to background 11:27:17:501 (UTC) | D/CrashlyticsCore i/WordPress-STATS: 🔵 Tracked: application_closed, Properties: {"last_visible_screen":"Notifications","time_in_app":569}

designsimply commented 5 years ago

30-day impact: ~41 per day Users affected: 961 Last seen in: 12.3 (Sentry issue: WORDPRESS-ANDROID-23)

(5c93a36df8b88c2963f28894-fabric-android)

Note: in 12.3 we switched to using Sentry and this issue is showing up there but is far more common in older versions of Android so far (or we do not have enough data for 12.3 yet to compare).

planarvoid commented 5 years ago

This crash is indeed related to https://github.com/wordpress-mobile/WordPress-Android/issues/5456 and it's happening because we're saving too much stuff into the savedInstanceState all over the place. I've found out the following:

TransactionTooLargeException occurs because the Parcel objects stored in Binder transaction buffer exceeds the limited size of 1Mb. The Binder transaction buffer is shared by all transactions in progress for the process. Consequently this exception can be thrown when there are many transactions in progress even when most of the individual transactions are of moderate size.

I think this means that any place in the app could cause this issue (or any place where we store a lot of stuff into the savedInstanceState). The proper solution would be to cache the data and only save IDs into savedInstanceState (and I see that's been done in the NotificationsDetailActivity) and use ViewModels to keep the visible state.

designsimply commented 5 years ago

A mobile request for help which came in over Apr and May may be a candidate for this crash and the user reported the following example steps to reproduce the issue they are facing:

  1. Create a new post using the Aztec editor in the app on an Android Tablet.
  2. Add 3000+ words.
  3. Add at least 6 YouTube video embeds.
  4. Add 45 to 85 images throughout the post.
  5. Try landscape and portrait modes.

I think the large number of images are the culprit in their case.

(internal references: p4a5px-2om-p2/#comment-9755 and 1953510-zen)

designsimply commented 4 years ago

Copying over steps to reproduce from a similar crash report at https://github.com/wordpress-mobile/WordPress-Android/issues/5456#issuecomment-333051335.

Steps to reproduce:

  1. On the web write a huge post. I used plain text only (460kb and 1mb )
  2. On your device enable "Don't keep activities".
  3. Open the post in Visual Editor.
  4. Press Home button to leave application.
  5. Notice crash.
byencho commented 4 years ago

As mentioned in https://github.com/wordpress-mobile/WordPress-Android/issues/5456#issuecomment-339688329 , if you can identify specific places where you put too much stuff in a savedInstanceState Bundle, you can use https://github.com/livefront/bridge as a (potentially temporary) workaround to avoid the crash. This works for Activities, Fragments, Views, ViewModels, etc. This library takes your Bundle data and manually saves it so that the OS doesn't need to push that data across processes to save it itself (which is when the TransactionTooLargeException will get triggered).

designsimply commented 4 years ago

30-day impact: ~37 per day Users affected: 885 in the last 30d First seen in: 11.8 Last seen in: 12.7.1 (latest major release at the time of this report)

https://sentry.io/share/issue/ed6aca84b5d74e40b016c9472defe828/

antonis commented 6 months ago

Reopening the issue since there are still occurrences in the latest release 23.4 (ref https://a8c.sentry.io/share/issue/ed6aca84b5d74e40b016c9472defe828/)

Pasting the stack trace:

android.os.TransactionTooLargeException: data parcel size 535144 bytes
    at android.os.BinderProxy.transactNative(BinderProxy.java)
    at android.os.BinderProxy.transact(BinderProxy.java:662)
    at android.app.IActivityClientController$Stub$Proxy.activityStopped(IActivityClientController.java:1309)
    at android.app.ActivityClient.activityStopped(ActivityClient.java:85)
    at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:143)
    at android.os.Handler.handleCallback(Handler.java:942)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:226)
    at android.os.Looper.loop(Looper.java:313)
    at android.app.ActivityThread.main(ActivityThread.java:8762)
    at java.lang.reflect.Method.invoke(Method.java)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:604)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1067)
java.lang.RuntimeException: android.os.TransactionTooLargeException: data parcel size 535144 bytes
    at android.app.ActivityClient.activityStopped(ActivityClient.java:88)
    at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:143)
    at android.os.Handler.handleCallback(Handler.java:942)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:226)
    at android.os.Looper.loop(Looper.java:313)
    at android.app.ActivityThread.main(ActivityThread.java:8762)
    at java.lang.reflect.Method.invoke(Method.java)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:604)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1067)
fluiddot commented 6 months ago

Sentry events analysis

I couldn't find any pattern after reviewing different Sentry events and sessions. However, I did notice that all of them are Atomic sites. This might be a clue although I couldn't reproduce the crash when testing an Atomic site.

As shared internally by @antonis (pcdRpT-4n3-p2#comment-7328), this crash has experienced a dramatic increase in version 23.5. But none of the changes either introduced in the app (reference) and the editor (reference) seem to be related to this.

Debugging stack trace

The only area of the app that can be associated with the crash is the editor, as the stack trace points to the function EditPostActivity.saveInstanceState. I've analyzed the data saved in the state using toolargetool in order to identify the culprit of this crash. Here are the results:

Steps:

  1. Create a post.
  2. Add different blocks.
  3. Add an Image block.
  4. In the "Choose image" bottom sheet, select "Choose from device".
  5. Dismiss the media picker and repeat step 4.

For the above scenario, I found that the saved state can easily reach between 200 - 300 KB. I identified two main actors for the unexpected size:

Theme data

Theme data is included as arguments for the two Gutenberg fragments** (GutenbergEditorFragment and GutenbergContainerFragment). As part of this data, we have two properties (rawFeatures and rawStyles) that hold a long JSON string. Per the tests I performed using different themes, this adds around 64 KB per fragment, i.e. 128 KB in total.

I tried to increase the data size by customizing a theme but this didn't end up incurring an extra size. I thought this might be a theme-specific issue. However, after testing the same themes used by users who experienced the issue, none of them have led to the crash.

NOTE: The key used to hold the data in the bundle is param_gutenberg_props_builder.editorTheme.

State of views used within the editor

Depending on the content of the post, I noticed that the instance state can be incremented.

This data is in the state of GutenbergEditorFragment with the key android:view_state in the form of an array. In the case of having 15 paragraphs, I saw the array can be really long with over 800 items. Some of them are generic and I couldn't identify the origin, but others referenced Aztec.

I'm unaware if we really need to keep the state of Aztec views, so might be interesting to explore how to remove this in the spirit to reduce the state size.

TransactionTooLargeException

Oddly, after testing different scenarios, I couldn't manage to make the instance state big enough to raise this exception. I'm wondering if the previous state upon entering the editor might be implicated. Based on the error logs, several of the crashes happen when reaching 500 KB, so taking into account that the editor easily takes 250 KB, the other half might be coming from other views/fragments. In any case, I haven't confirmed this hypothesis and we'd need to investigate further.

Version 24.5 vs version 24.4

I performed the same analysis on both versions 24.4 and 24.5 and got the same results.

Next Steps

antonis commented 6 months ago

Thank you for the detailed investigation and report @fluiddot 🙇

after testing different scenarios, I couldn't manage to make the instance state big enough to raise this exception.

Same. I wasn't able to reproduce a crash.

I'm unaware if we really need to keep the state of Aztec views, so might be interesting to explore how to remove this in the spirit to reduce the state size.

Good idea 👍

Since we cannot predict the size of the data on each site I think our best bet would be to replace the current mechanism that passes the data in the Bundle with an implementation that saves everything (or at least the large data) in the database.

fluiddot commented 6 months ago

Since we cannot predict the size of the data on each site I think our best bet would be to replace the current mechanism that passes the data in the Bundle with an implementation that saves everything (or at least the large data) in the database.

I totally agree. A comment in this issue (https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-499508435) also pointed out a similar solution by using the library https://github.com/livefront/bridge.

dcalhoun commented 6 months ago

I was able to reproduce this error a few times. Unfortunately, I am not entirely confident of the reproduction steps. Also, I did not observe a visible crash, the app is in the background and when I resume the app it is in the expected state.

I was able to reproduce it with an incredibly large post (entirety of Moby-Dick, literally) with the following steps:

  1. Open the post.
  2. (Optional) Take some action in the post, e.g. modify content, switch to HTML mode.
  3. Close the post, triggering an auto save.
  4. Background the app.
  5. At some point (a few seconds/minutes later), restart with step 1.

Notably, I was only able to reproduce the error with an Atomic site.

mkevins commented 6 months ago

I've been debugging this, continuing from the work described above. Thanks to the steps described by David, I was able to reproduce the issue as well, and also confirmed Carlos's findings that at least part of the culprit seems to be related with the bundle size for the param_gutenberg_props_builder.editorTheme. Notably, when I examined the contents of the RAW_FEATURES column (present in the EditorTheme table, I found that there is a large difference between simple and atomic sites. It seems that the atomic site includes 48 default font families, which comprise the bulk of the payload.

This also suggests that back-end changes can impact the incidence rate for this issue (as increasing the raw feature payload size will put pressure on the state-saving mechanism.

Another puzzling aspect to this is that it seems to be triggerable more easily with a larger post. I was first able to reproduce this via David's steps (also using Moby Dick text as the body of the post). It isn't yet clear to me, however, whether this directly impacts the transaction size, whether it merely contributes to setting up the conditions for a race condition, or possibly both.

After reproducing this with a large post, I was also able to reproduce this with a small post by opening, making a small change, and then very quickly closing the post.

A few things struck me as odd while examining the code paths. One is that we set content arguments on the fragment via this key, but we don't seem to ever read it back via this key. Also, we set this as an empty string. I wonder whether this is necessary 🤔 ? Due to the complexity of the code, I could not confidently verify, but at a glance, this looks like dead code. 🤷‍♂️

Another strange smell (and quite possibly related with the transaction size) is that there are duplicate declarations for ARG_GUTENBERG_PROPS_BUILDER: here and here. I wonder if this somehow doubles the impact of the rawFeatures payload size increase?

mkevins commented 6 months ago

Circling back to reply about this point:

Depending on the content of the post, I noticed that the instance state can be incremented.

  • +41 KB for an empty image and one paragraph.
  • +81 KB for an empty image and 15 paragraphs.
  • +117 KB for an empty image and 30 paragraphs.

This suggests that the state sites grows / scales at roughly ~2.6 KB per paragraph (if my napkin math is correct 😅).

I'm unaware if we really need to keep the state of Aztec views, so might be interesting to explore how to remove this in the spirit to reduce the state size.

This is a good point, but I don't think it is trivial, since afaiu, we do utilize at least part of the retained state for selection / caret position.

It is also interesting to note that this problem cropped up many years ago in Aztec, and a solution was implemented for the most costly state objects, which were persisted to temporary files. Maybe it is worth checking (as a stop-gap solution) whether this existing solution can be consolidated for the entire AztecText bundle 🤔 ?

Otoh, with such a solution, the temp filename would still be saved "per-paragraph", which runs into scaling issues - so, it merely buys us a bit more headroom. A more "proper" solution would be some kind state hoisting, especially since only one paragraph can be focused at a time, so we are storing caret / selection positions for everything else wastefully. I'm also not sure whether we are using any of the historyList, or other objects in Gutenberg 🤷‍♂️ .

There have historically been other tensions arising out of a shared consumption of the AztecText dependency (other bugs related with caret position, performance issues, etc.), so I think we should consider that some refactoring here has additional value beyond this crash-related issue.

derekblank commented 6 months ago

Since we cannot predict the size of the data on each site I think our best bet would be to replace the current mechanism that passes the data in the Bundle with an implementation that saves everything (or at least the large data) in the database.

A comment in this issue (https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-499508435) also pointed out a similar solution by using the library https://github.com/livefront/bridge.

@fluiddot / @antonis / @mkevins Given that this is the top crash that places our Android apps quality metrics in the Critical zone, are we able to reason about how much work it might be to implement a stopgap solution like Bridge to address this TransactionTooLarge issue (if only to provide more time to find a more permanent solution)? Just asking as I don't have any experience with this library, so I'm curious if there would be any downsides or blockers into doing so, if only in order to stablize the crash metrics.

derekblank commented 6 months ago

@mkevins Another thought -- this issue began spiking heavily on November 7th, concurrent with the release of WordPress 6.4. In an attempt to revert (or identify) the issue that caused this spike, do you note if there is any Gutenberg fragment data added to param_gutenberg_props_builder.editorTheme from a change in the 6.4 release?

mkevins commented 6 months ago

do you note if there is any Gutenberg fragment data added to param_gutenberg_props_builder.editorTheme from a change in the 6.4 release?

I suspect that it could be related with the 48 additional font families (mentioned here), though I haven't yet identified the changeset which added those. Even if we can fix this on the back-end, though, it is valuable to consider addressing how we handle these settings on mobile to minimize our vulnerability to similar changes in the future.

Using the bridge library may be an option, though we want to be sure we're not trading a crash issue for another issue that is less visible to us. Also, we have our own stop-gap solution to consider as well (persistence via cache file). Finally, we could consider that we are persisting these editor settings in the database already, so we may have a degree of redundancy. It is worth exploring whether we can remove some of the duplication of the state objects. E.g. this data is already stored in a FluxC database table, as well as in the instance state for one or more Fragements (is it also stored in the RN context?).

fluiddot commented 5 months ago

@mkevins Since this crash is affecting a meaningful percentage of users/sessions, I'd advocate exploring the feasibility of removing the theme data from the fragment state. If I'm not wrong theme data is cached per site, so I presume that it could be retrieved just by using the site ID. If that's the case, we won't need to keep it in the fragment state and hence, we'll address having these dynamic properties (like theme styles/features) that might produce this issue in the future. WDYT?

Regarding the other cause related to post content and views state, it would be great to explore why we need it. Following https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1811935939 and retaining the selection state in Aztec, I wonder whether we need this as the selection is controlled by the RichText component in Gutenberg 🤔. In case it's needed, I agree it would be helpful to only retain the state of the Aztec instance that is focused, not all of them.

In any case, based on previous comments (https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1806292791), seems triggering the exception due to the views state size would be a bit of an edge case, especially as it needs a very large number of text blocks to trigger it. Therefore, I'd focus primarily on solving the exception for the case of theme data properties.

mkevins commented 5 months ago

Since this crash is affecting a meaningful percentage of users/sessions, I'd advocate exploring the feasibility of removing the theme data from the fragment state. If I'm not wrong theme data is cached per site, so I presume that it could be retrieved just by using the site ID. If that's the case, we won't need to keep it in the fragment state and hence, we'll address having these dynamic properties (like theme styles/features) that might produce this issue in the future. WDYT?

Exactly! 🎯 I think there are opportunities to refactor and clean up the current data flow graph. One place to consider as a starting point to study, debug, and tease out the requirements is in the editor store actions. At a higher level of analysis, it is possible that there are some unique requirements for these properties and how they are consumed, but I think the general idea you propose seems feasible to me (i.e. more directly serving these properties from the native db cache layer). One thing to keep in mind, though, is the asynchronous nature of both the db queries and the bridge traffic.

I wonder whether we need this as the selection is controlled by the RichText component in Gutenberg 🤔

This may be a bit tricky, imo, as afaik, the native TextViews still "think" they are the source of truth for focus state (under the hood, they are plain old Android Views after all). This tension / contention between the abstractions (and their assumptions) within the React Native runtime and the native framework view logic has been the source of some bugs in the past. If we consider refactoring this area of the code, a wholistic approach is warranted, imo, so we don't re-introduce any of those older bugs. This could be valuable in other ways, as well, since I believe this would also help address some of the performance issues with large posts too.

derekblank commented 5 months ago

The major spike in Sentry reports for this issue beginning on November 7th may have been intensified by a Jetpack backend change where a large addition of font data on Atomic sites increased the size of the rawStyles payload in the Android app. This change has now been rolled back, and we may see a reduction in Sentry reports for this issue.

Incoming backend change: https://github.com/Automattic/jetpack/pull/33203 Rollback: https://github.com/Automattic/jetpack/pull/34157

Whether or not this was the actual cause for the crash spike, from the mobile side, it seems the core of the issue is that the app is still vulnerable to TTLE crashes from backend payloads, as @antonis described in https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1806096568:

Since we cannot predict the size of the data on each site I think our best bet would be to replace the current mechanism that passes the data in the Bundle with an implementation that saves everything (or at least the large data) in the database.

@mkevins noted that there is a large amount of font data in EditorTheme even for non-atomic sites. I agree with @fluiddot's https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1817475519 to continue exploring the feasibility of removing this theme data from the fragment state. If we do not see a reduction in Sentry reports over the next day or so from the rollback of the backend Jetpack code, I would suggest we prioritize this to stabilize our Crash Free app metrics.

igotdes commented 5 months ago

Just noting tickets from users who seem to have been affected by the crash:

We can follow up with these users once a fix is available for them.

derekblank commented 5 months ago

I can now replicate this consistently and easily on atomic sites using a standard theme.

Using Jetpack Android 23.6 on a WP.com site:

  1. Select a standard or classic theme, like Bibimbap
  2. Create a post
  3. Add an image and select any of the media upload options (Choose from device, Take a Photo, etc)
  4. Note that app will return to the main screen on the first attempt
  5. Note that the app will crash on the second attempt, and every attempt thereafter
  6. Change to a block-based theme like Blockbase
  7. Repeat steps 2 - 5
  8. Observe app does not crash

Here is my Sentry event confirming that it is a TransactionTooLargeException. I inspected the EditorTheme database table locally and the rawFeatures value still contained the extra font data. This column alone measured 115.41 kb on an empty post, consistent with previous findings that there is a lot of data present in the EditorTheme table.

On a self-hosted site, the crash does not occur with a standard or block-based theme, and the extra font data was not present in the EditorTheme database. This suggests that the Jetpack rollback of the font data may not apply to WP.com sites using certain themes. We are receiving a lot of support tickets describing the image uploads crashing the app with steps consistent to these reproduction steps, and I believe this may be the way most users are currently experiencing the TTLE crash (as opposed to the issue occurring more often on very large posts). It's also worth noting that when invoking the crash via uploading an image, the image will still upload successfully and appear in the site's Media Library in most cases.

Edit: Some block-based themes like Twenty Twenty-Three still produce the crash, so it may be more accurate to state that the crash does not occur on themes that do not include the larger font family payloads in rawFeatures.

Example Video
dcalhoun commented 5 months ago

I can now replicate this consistently and easily on atomic sites using a standard theme.

[...]

On a self-hosted site, the crash does not occur with a standard or block-based theme, and the extra font data was not present in the EditorTheme database. This suggests that the Jetpack rollback of the font data may not apply to WP.com sites using certain themes.

This may be understood, but I wanted to note that the server fix is deployed separately for Jetpack-connected sites (https://github.com/Automattic/jetpack/pull/34157) and WPCOM sites (https://github.com/Automattic/wpcomsh/pull/1575). My understanding is that the latter was deployed ~19 hours ago (p9o2xV-3Dy-p2#comment-8360, p1700564206848809-slack-CBG1CP4EN). Testing before or after that point may have different outcomes.

antonis commented 5 months ago

As a client side workaround to the font families payload issue I've created a FluxC PR https://github.com/wordpress-mobile/WordPress-FluxC-Android/pull/2904 that removes those from the stored payload. I would appreciate any extra eyes on the changes and testing 🙇

SiobhyB commented 5 months ago

For those following along, @antonis' PR was merged into the 23.7 branch. 🎉 The fix will be available to users when the release begins rollout on Monday, 27th onwards.

fluiddot commented 5 months ago

For those following along, @antonis' PR was merged into the 23.7 branch. 🎉 The fix will be available to users when the release begins rollout on Monday, 27th onwards.

Great news 🎊 ! I'll proceed to close the issue then. If we identify new crashes related to this exception, we could reopen it.

antonis commented 5 months ago

Thank you for all the help on this. Unfortunately my fix with https://github.com/wordpress-mobile/WordPress-FluxC-Android/pull/2904 only handles a specific case of this issue related to the large font payload :( I will reopen this for now to keep the conversation active. As part of the maintenance week I'm exploring better fixes for this (ref: pcdRpT-4vD-p2#comment-7472).

carolframen commented 5 months ago

Got another ticket 7337996-zen

justtwago commented 5 months ago

I've managed consistently reproduce the crash by the following steps:

  1. Go to the editor
  2. Put a huge text (thousands of symbols)
  3. Then go to a media picker to pick a photo
  4. Boom - TransactionTooLargeException is thrown

https://github.com/wordpress-mobile/WordPress-Android/assets/16563318/a55c3598-6987-40bf-9c04-b733f0d5e6db

Stacktrace:

FATAL EXCEPTION: main
Process: com.jetpack.android.prealpha, PID: 25390
java.lang.RuntimeException: android.os.TransactionTooLargeException: data parcel size 1293520 bytes
Bundle stats:
  android:viewHierarchyState [size=1844]
    android:views [size=1796]
  stateKeyEditorSessionData [size=3696]
  SITE [size=3040]
  androidx.lifecycle.BundlableSavedStateRegistry.key [size=1283200]
    android:support:activity-result [size=3236]
      KEY_COMPONENT_ACTIVITY_REGISTERED_KEYS [size=2712]
    android:support:fragments [size=1279588]
      fragment_5d3a3a21-bae2-4483-9f9e-6885cf8e9080 [size=1265784]
        arguments [size=31204]
          param_gutenberg_props_builder [size=29944]
        viewState [size=1202116]
          0xf [size=1528]
          0xb1 [size=1191432]
        childFragmentManager [size=31208]
          fragment_873fae23-e456-4e1a-896c-050a1630f466 [size=30788]
            arguments [size=30024]
              param_gutenberg_props_builder [size=29944]
      fragment_5c4703a2-afc6-4db0-af00-2d1359b4e04b [size=5752]
        viewState [size=4756]
      fragment_79012020-1a29-481a-be30-8a0805b110e5 [size=1436]
      fragment_a991c6d3-da8c-4d39-b612-4fec18a0d14f [size=5428]
        arguments [size=3128]
          key_site [size=3040]
        viewState [size=1548]
PersistableBundle stats:
  [null]
    at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:146)
    at android.os.Handler.handleCallback(Handler.java:958)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:205)
    at android.os.Looper.loop(Looper.java:294)
    at android.app.ActivityThread.main(ActivityThread.java:8177)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:552)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:971)
Caused by: android.os.TransactionTooLargeException: data parcel size 1293520 bytes
    at android.os.BinderProxy.transactNative(Native Method)
    at android.os.BinderProxy.transact(BinderProxy.java:584)
    at android.app.IActivityClientController$Stub$Proxy.activityStopped(IActivityClientController.java:1439)
    at android.app.ActivityClient.activityStopped(ActivityClient.java:96)
    at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:135)
    at android.os.Handler.handleCallback(Handler.java:958) 
    at android.os.Handler.dispatchMessage(Handler.java:99) 
    at android.os.Looper.loopOnce(Looper.java:205) 
    at android.os.Looper.loop(Looper.java:294) 
    at android.app.ActivityThread.main(ActivityThread.java:8177) 
    at java.lang.reflect.Method.invoke(Native Method) 
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:552) 
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:971) 
antonis commented 5 months ago

We are shipping a fix in 23.9 replacing the mechanism that passes large data objects in the Bundle with a mechanism saving the large Parcelable objects in a local database https://github.com/wordpress-mobile/WordPress-Android/pull/19747 🤞 I'm keeping this issue open since the exception may still occur for large view data (pretty large posts) that are in the Bundle by the Android system. We tried some hacky solutions on this (example https://github.com/wordpress-mobile/WordPress-Android/commit/4e385155db52b559f74762e2a8551629d1490eeb) but since the stability of the editor might be affected we will monitor the current fix rollout before proceeding with further changes.

fluiddot commented 5 months ago

I'm keeping this issue open since the exception may still occur for large view data (pretty large posts) that are in the Bundle by the Android system. We tried some hacky solutions on this (example 4e38515) but since the stability of the editor might be affected we will monitor the current fix rollout before proceeding with further changes.

As expected the issue still occurs, however, the spike we saw in version 23.5 (https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1806080086) is no longer happening and occurrences went back to similar numbers we had before 🎊 .

Screenshot 2023-12-15 at 09 53 42
antonis commented 5 months ago

As expected the issue still occurs, however, the spike we saw in version 23.5 (https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1806080086) is no longer happening and occurrences went back to similar numbers we had before 🎊 .

We have no occurrences in 23.9-rc-* after the fix https://github.com/wordpress-mobile/WordPress-Android/pull/19747 but it's still early in the beta. I expect that the issue will occur again (https://github.com/wordpress-mobile/WordPress-Android/pull/19747#issuecomment-1847074075) but hopefully less often 🤞 I marked the issue as solved in Sentry in 23.9-rc-2 to get a regression alert when it reoccurs and examine additional fixes. The current regression alert was from 23.8.1 that was created after 23.9-rc-1.

antonis commented 3 months ago

Revisiting the crash reports on release 23.9 onwards I noticed the following:

  1. The part of the issue fixed with https://github.com/wordpress-mobile/WordPress-Android/pull/19747 did not reoccur 🎉
  2. As expected (https://github.com/wordpress-mobile/WordPress-Android/pull/19747#issuecomment-1845596986) the viewHierarchyState crash persists but it is less frequent than 1.
    RuntimeException
    android.os.TransactionTooLargeException: data parcel size 842728 bytes
    Bundle stats:
    android:viewHierarchyState [size=1536]
    android:views [size=1488]
    stateKeyEditorSessionData [size=5380]
    SITE [size=4728]
    androidx.lifecycle.BundlableSavedStateRegistry.key [size=829500]
    android:support:activity-result [size=3236]
      KEY_COMPONENT_ACTIVITY_REGISTERED_KEYS [size=2712]
    android:support:fragments [size=825888]
      fragment_431b0c7b-a639-4ecc-b449-97c52282a440 [size=811848]
        viewState [size=809228]
          0xf [size=1472]
          0xcb [size=1472]
          0xd9 [size=1472]...
  3. The TransactionTooLargeException also occurs in WPWebViewActivity
    android.os.TransactionTooLargeException: data parcel size 555708 bytes
    at android.os.BinderProxy.transactNative(BinderProxy.java)
    at android.os.BinderProxy.transact(BinderProxy.java:662)
    at android.app.IActivityClientController$Stub$Proxy.activityStopped(IActivityClientController.java:1507)
    at android.app.ActivityClient.activityStopped(ActivityClient.java:100)
    at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:135)
    at android.os.Handler.handleCallback(Handler.java:958)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:230)
    at android.os.Looper.loop(Looper.java:319)
    at android.app.ActivityThread.main(ActivityThread.java:8913)
    at java.lang.reflect.Method.invoke(Method.java)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:608)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1103)
    java.lang.RuntimeException: android.os.TransactionTooLargeException: data parcel size 555708 bytes
    Bundle stats:
    android:viewHierarchyState [size=3228]
    android:views [size=3124]
    androidx.lifecycle.BundlableSavedStateRegistry.key [size=1056]
    WEBVIEW_CHROMIUM_STATE [size=550620]
    PersistableBundle stats:
    [null]
    at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:146)
    at android.os.Handler.handleCallback(Handler.java:958)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:230)
    at android.os.Looper.loop(Looper.java:319)
    at android.app.ActivityThread.main(ActivityThread.java:8913)
    at java.lang.reflect.Method.invoke(Method.java)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:608)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1103)
antonis commented 3 months ago
  1. As expected (https://github.com/wordpress-mobile/WordPress-Android/pull/19747#issuecomment-1845596986) the viewHierarchyState crash persists but it is less frequent than 1.

Heads up that I've spent some time investigating possible workarounds on this today since it remains one of our top crashes:

  1. Saving the view hierarchy similar to https://github.com/wordpress-mobile/WordPress-Android/pull/19747 (see saveViewHierarchyState.patch). This resulted in a crash on resuming due to BadParcelableException which I wasn't able to work around.
  2. Tried https://github.com/livefront/bridge and https://github.com/frankiesardo/icepick without a success. Another consideration is that those libraries haven't been maintained for many years.
  3. Tried dumping the view data when exceeding a certain threshold (example dumpViewHierarchyState.patch). This resulted in crashes due to destroyed fragments.
antonis commented 3 months ago
  1. As expected (https://github.com/wordpress-mobile/WordPress-Android/pull/19747#issuecomment-1845596986) the viewHierarchyState crash persists but it is less frequent than 1.

Prepared another approach for this with https://github.com/wordpress-mobile/WordPress-Android/pull/20046

antonis commented 3 months ago
  1. The TransactionTooLargeException also occurs in WPWebViewActivity

Reopening since this is not covered yet

antonis commented 3 months ago
  1. The TransactionTooLargeException also occurs in WPWebViewActivity

I haven't managed to reproduce this issue on my device but there are reports (Chromium issue) that the WebView.saveState (used in the parent activity WebViewActivity) might be the root of the crash. Note that the behaviour of this method has changed since this code was introduced 9 years ago and the method no longer stores the display data for the WebView which might make it obsolete. Similar to this SO report I experimented with removing this call (see the patch below) and didn't notice any difference on the WebView behaviour in my (not so extensive) tests.

diff --git a/WordPress/src/main/java/org/wordpress/android/ui/WebViewActivity.java b/WordPress/src/main/java/org/wordpress/android/ui/WebViewActivity.java
index 8770f417b37..2cd5bb5fc55 100644
--- a/WordPress/src/main/java/org/wordpress/android/ui/WebViewActivity.java
+++ b/WordPress/src/main/java/org/wordpress/android/ui/WebViewActivity.java
@@ -80,24 +80,6 @@ public abstract class WebViewActivity extends LocaleAwareActivity {
         }
     }

-    /*
-     * save the webView state with the bundle so it can be restored
-     */
-    @Override
-    protected void onSaveInstanceState(Bundle outState) {
-        mWebView.saveState(outState);
-        super.onSaveInstanceState(outState);
-    }
-
-    /*
-     * restore the webView state saved above
-     */
-    @Override
-    protected void onRestoreInstanceState(Bundle savedInstanceState) {
-        super.onRestoreInstanceState(savedInstanceState);
-        mWebView.restoreState(savedInstanceState);
-    }
-
     @Override
     protected void onPause() {
         super.onPause();

A fix like the above would require extensive testing since it would affect the SSLCertsViewActivity, the WPWebViewActivity which is used across the app and also extended by:DomainRegistrationCheckoutWebViewActivity, DomainManagementDetailsActivity, ThemeWebActivity, SupportWebViewActivity and JetpackConnectionWebViewActivity.

Considering the above and that I haven't managed to reproduce the crash to validate this as a fix I haven't proceeded with a PR with this approach yet.

Another (maybe safer) hacky alternative would be to remove the WEBVIEW_CHROMIUM_STATE from the saved Bundle when it exceeds a certain threshold. Something like this is also used in the Chromium source code.

sentry-io[bot] commented 3 months ago

Sentry issue: JETPACK-ANDROID-7Y4

sentry-io[bot] commented 3 months ago

Sentry issue: JETPACK-ANDROID-8HV

antonis commented 3 months ago

Marking as resolved in 24.2 with https://github.com/wordpress-mobile/WordPress-Android/pull/19747, https://github.com/wordpress-mobile/WordPress-Android/pull/20046 and https://github.com/wordpress-mobile/WordPress-Android/pull/20139. I'll keep an eye for any regressions in Sentry

sentry-io[bot] commented 3 months ago

Sentry issue: WORDPRESS-ANDROID-2V8H

antonis commented 2 months ago

Marking as resolved in 24.2 with https://github.com/wordpress-mobile/WordPress-Android/pull/19747, https://github.com/wordpress-mobile/WordPress-Android/pull/20046 and https://github.com/wordpress-mobile/WordPress-Android/pull/20139. I'll keep an eye for any regressions in Sentry

With 24.2 fully rolled out 6 crashes by 3 users were recorder this week all of them in the WordPress app editor on low end devices. From the logs the following might indicate the usage of Aztec 🤔

I'll keep monitoring the issue and iterate with additional fixes if needed.

sentry-io[bot] commented 2 months ago

Sentry Issue: WORDPRESS-ANDROID-2QCB

sentry-io[bot] commented 2 months ago

Sentry Issue: JETPACK-ANDROID-E9W

sentry-io[bot] commented 2 months ago

Sentry Issue: WORDPRESS-ANDROID-2TD8

antonis commented 2 months ago

I've linked a few more occurrences of TransactionTooLargeException. In the past 30d and in the latest public versions of the app (24.2 and 24.3.1 that include the fixes https://github.com/wordpress-mobile/WordPress-Android/issues/9685#issuecomment-1939128322) 12 users encountered 21 crashes in total.

sentry-io[bot] commented 1 month ago

Sentry Issue: JETPACK-ANDROID-MKP

dangermattic commented 1 month ago

Thanks for reporting! 👍

antonis commented 1 month ago

Given that the occurrences of TransactionTooLargeException related crashes after 24.2 is low I lowered the priority to medium. I'll keep monitoring and fix issues as needed.

Screenshot 2024-03-28 at 10 46 07 AM Screenshot 2024-03-28 at 10 45 52 AM Screenshot 2024-03-28 at 10 46 21 AM