wordpress-mobile / WordPress-Android

WordPress for Android
http://android.wordpress.org
GNU General Public License v2.0
2.99k stars 1.32k forks source link

Avoid job id collisions in WorkManager configuration #18632

Closed sentry-io[bot] closed 1 year ago

sentry-io[bot] commented 1 year ago

This issue was originally opened as a duplicate of this issue: https://github.com/wordpress-mobile/WordPress-Android/issues/16015. The same conclusion was reached: that our JobIntentServices and JobServices should be migrated to WorkManager. This issue has now been converted to a preliminary task to configure the existing WorkManager implementation to use a Job ID range that does not collide with our existing jobs. We after this is addressed, we should follow up to see whether this has any impact on the incidence rate of this crash.

Note: we should still migrate to WorkManager, but that is work has a larger scope, and should not be a blocker to this, which has the potential to address the crash occurrences sooner.

Sentry Issue: JETPACK-ANDROID-3S

RemoteException: Remote stack trace:
    at com.android.server.job.JobServiceContext.assertCallerLocked(JobServiceContext.java:493)
    at com.android.server.job.JobServiceContext.doDequeueWork(JobServiceContext.java:371)
    at com.android.server.job.JobServiceContext$JobCallback.dequeueWork(JobServiceContext.java:160)
    at android.app.job.IJobCallback$Stub.onTransact(IJobCallback.java:169)
    at android.os.Binder.execTransactInternal(Binder.java:1021)

SecurityException: Caller no longer running, last stopped +849ms because: timed out while starting
    at android.os.Parcel.createException(Parcel.java:2072)
    at android.os.Parcel.readException(Parcel.java:2040)
    at android.os.Parcel.readException(Parcel.java:1988)
    at android.app.job.IJobCallback$Stub$Proxy.dequeueWork(IJobCallback.java:292)
    at android.app.job.JobParameters.dequeueWork(JobParameters.java:248)
...
(9 additional frame(s) were not displayed)

RuntimeException: An error occurred while executing doInBackground()
    at android.os.AsyncTask$4.done(AsyncTask.java:399)
    at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:383)
    at java.util.concurrent.FutureTask.setException(FutureTask.java:252)
    at java.util.concurrent.FutureTask.run(FutureTask.java:271)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
...
(2 additional frame(s) were not displayed)

image

sentry-io[bot] commented 1 year ago

Sentry issue: JETPACK-ANDROID-7YC

mkevins commented 1 year ago

This looks like a framework bug, tracked here, (and also here).

This is one of those annoying "won't fix" bugs that seems to have persisted for years. We should migrate to WorkManager. Our only usage is here: https://github.com/wordpress-mobile/WordPress-Android/blob/a78c4feb91dc1a2b76fa7fd6ac38991ffdbf02f7/WordPress/src/main/java/org/wordpress/android/push/GCMRegistrationIntentService.java#L29

mkevins commented 1 year ago

Digging a bit further on this one, I believe the specific source of this Exception is here. It isn't clear why the callback assertion would fail in doDequeueWork, but it's possible that it could be related with colliding jobIds, since we are using WorkManager and JobServices side-by-side in our app.

WorkManager creates jobs with it's own ids, which should not overlap with the ids we use "manually" in the JobInfo builders for our services. The default range of ids used by WorkManager (if not specified) is from zero to Integer.MAX_VALUE, and this is auto-incremented for each new work request, wrapping at the maximum back to the minimum of the range, and persisted to disk as a preference. This means that work requests start at zero, and keep using higher and higher job service ids. Eventually, I believe this can lead to collisions with the ids of our "manually" created Job Services, which start at 1000.

We can specify a range of ids for WorkManager to use when generating job service ids, and avoid this collision, via setJobSchedulerJobIdRange, which might help to avoid this issue (it is difficult to know, since this may be difficult to reproduce). Either way, I think we should do so, since these job ids should not be colliding anyway, so I've drafted a PR for this purpose (https://github.com/wordpress-mobile/WordPress-Android/pull/18769).

irfano commented 1 year ago

I found a duplicate issue for this. @mkevins, since you're working on this issue, consider closing one of the duplicated issues.

mkevins commented 1 year ago

I found a duplicate issue for this. @mkevins, since you're working on this issue, consider closing one of the duplicated issues.

You're right! It is not only the same issue, but we've come to the same conclusion about the need to migrate JIS to WorkManager 😄 . I don't know why this was not linked / grouped in Sentry though 🤔 , especially since the title is identical. 🤷‍♂️ . I will dedup this 👍

mkevins commented 1 year ago

Since this is a dupe, I'm going to repurpose this PR to be the short-term task (and possibly solution to the crash 🤞) of making WorkManager config avoid id collisions. This way, we can leave the other duplicate open, and that can be related with the refactor of WorkManager, which should also be done at some point, regardless of whether this id collision fix resolves the underlying crash (as it is a technical debt).

mkevins commented 1 year ago

It appears that the id collision fix did not address the issue, as we are still seeing this crash occur in 22.9 (Sentry issue). The issue linked in the description appears to be the same as this one, which is also the same as the WordPress issue (not only Jetpack). The WorkManager migration task priority will be raised, since that has potential to reduce or eliminate these crashes.

fluiddot commented 1 year ago

I checked the metrics for the issue and seems that on version 23.1 the impact has severely decreased.

I'll archive the Sentry event until it occurs again 10 times.

dcalhoun commented 1 year ago

Sharing that the related Sentry issue was unarchived and marked as "escalating" as it occurred 10 additional times. Below is an updated snapshot of the event counts.

The 23.1 release still represents a smaller percentage currently, but we may see it reach similar counts as prior releases.

fluiddot commented 1 year ago

The 23.1 release still represents a smaller percentage currently, but we may see it reach similar counts as prior releases.

If we foresee that the issue would have a similar percentage as in past versions, probably we should reopen the issue to explore further solutions. WDYT?

dcalhoun commented 1 year ago

If we foresee that the issue would have a similar percentage as in past versions, probably we should reopen the issue to explore further solutions. WDYT?

Possibly. @mkevins do you feel it is worthwhile to reopen this issue or does the related https://github.com/wordpress-mobile/WordPress-Android/issues/16015 and planned migration work referenced in https://github.com/wordpress-mobile/WordPress-Android/issues/16015#issuecomment-1687306675 serve as the best place to track work to address this Sentry issue?

mkevins commented 1 year ago

I think we should keep this issue closed, and track the sentry issue(s) in #16015. This was opened as a duplicate, inadvertently, but was repurposed to try the solution described in the title (i.e. the changes merged in #18769). The WorkManager migration should still be done, and I believe it will resolve the remaining crashes.