urbanairship / android-library

Urban Airship Android SDK
Other
112 stars 123 forks source link

ANR in AirshipGcmTaskService #23

Closed cnordvik closed 7 years ago

cnordvik commented 7 years ago

We have a lot of ANR in our crash reports in the Play Console. A lot of them (80%) points to AirshipGcmTaskService as the offending code.

https://github.com/urbanairship/android-library/blob/master/urbanairship-sdk/src/main/java/com/urbanairship/job/AirshipGcmTaskService.java

Am I correct that onRunTask runs on the main thread? And you run latch.await(); on the main thread waiting for the job to finish? To me this sounds very much like something that can generate a lot of ANRs?

We managed to get it once ourselves as well, but it's hard to reproduce this but we have thousands of users experiencing this issue so it seems to happen a lot.

marc-scig commented 7 years ago

Hi there. In the GCM/FCM docs, onRunTask invocation is described as follows:

"When the scheduler starts your service, a new thread is created and the system invokes onRunTask()"

Without access to your logs or analytics it's difficult to say what the source of your problem is, but given the above we can probably rule out the latch await.

cnordvik commented 7 years ago

Well, last month we have 57738(!) reports of ANR with the issue below. Only on Android 5.0 and 5.1. This is a crazy amount leading Google to show that we're in the bottom 25% of apps on Google Play regarding ANR which in turns gives us a ranking penalty.

I have a hard time understanding the root cause of this but the countdownLatch seems to indicate some locking issue.

image

image


"main" tid=1 Waiting 
"main" prio=5 tid=1 Waiting
  | group="main" sCount=1 dsCount=0 obj=0x770c8000 self=0xb4427800
  | sysTid=2073 nice=0 cgrp=default sched=0/0 handle=0xb6f2bbec
  | state=S schedstat=( 0 0 0 ) utm=88 stm=41 core=3 HZ=100
  | stack=0xbe002000-0xbe004000 stackSize=8MB
  | held mutexes=
  at java.lang.Object.wait! (Native method)
- waiting on <0x13f32cef> (a java.lang.Object)
  at java.lang.Thread.parkFor (Thread.java:1220)
- locked <0x13f32cef> (a java.lang.Object)
  at sun.misc.Unsafe.park (Unsafe.java:299)
  at java.util.concurrent.locks.LockSupport.park (LockSupport.java:157)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt (AbstractQueuedSynchronizer.java:813)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly (AbstractQueuedSynchronizer.java:973)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly (AbstractQueuedSynchronizer.java:1281)
  at java.util.concurrent.CountDownLatch.await (CountDownLatch.java:202)
  at android.app.SharedPreferencesImpl$EditorImpl$1.run (SharedPreferencesImpl.java:363)
  at android.app.QueuedWork.waitToFinish (QueuedWork.java:88)
  at android.app.ActivityThread.handleServiceArgs (ActivityThread.java:3949)
  at android.app.ActivityThread.access$2300 (ActivityThread.java:218)
  at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1832)
  at android.os.Handler.dispatchMessage (Handler.java:102)
  at android.os.Looper.loop (Looper.java:145)
  at android.app.ActivityThread.main (ActivityThread.java:6934)
  at java.lang.reflect.Method.invoke! (Native method)
  at java.lang.reflect.Method.invoke (Method.java:372)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:1404)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1199)
"pool-7-thread-1" tid=28 Native 
"pool-7-thread-1" prio=5 tid=28 Native
  | group="main" sCount=1 dsCount=0 obj=0x1ac01b70 self=0xb4784000
  | sysTid=2219 nice=0 cgrp=default sched=0/0 handle=0xb451a200
  | state=D schedstat=( 0 0 0 ) utm=0 stm=0 core=0 HZ=100
  | stack=0xb3a34000-0xb3a36000 stackSize=1036KB
  | held mutexes=
  native: pc 000000000003a468  /system/lib/libc.so (fsync+12)
  native: pc 000000000001f043  /system/lib/libjavacore.so (???)
  native: pc 000000000007574b  /system/framework/arm/boot.oat (Java_libcore_io_Posix_fsync__Ljava_io_FileDescriptor_2+102)
  at libcore.io.Posix.fsync (Native method)
  at libcore.io.BlockGuardOs.fsync (BlockGuardOs.java:143)
  at java.io.FileDescriptor.sync (FileDescriptor.java:74)
  at android.os.FileUtils.sync (FileUtils.java:152)
  at android.app.SharedPreferencesImpl.writeToFile (SharedPreferencesImpl.java:598)
  at android.app.SharedPreferencesImpl.access$800 (SharedPreferencesImpl.java:51)
  at android.app.SharedPreferencesImpl$2.run (SharedPreferencesImpl.java:512)
- locked <@addr=0x1acc03c0> (a java.lang.Object)
  at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1112)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:587)
  at java.lang.Thread.run (Thread.java:818)
cnordvik commented 7 years ago

And this SO post seems to indicate doing commit instead of apply might remedy the problem: https://stackoverflow.com/questions/37549578/how-to-get-something-useful-from-this-anr-log

cnordvik commented 7 years ago

And here is another locking with UrbanAirship that clearly seems to indicate there are some threading issues as it says "- waiting to lock <0x2b46a6b3> (a java.lang.Object) held by thread 30". This is on Android 6.x as well. image

We believe a lot of these ANRs are when a user receives a push causing our app to start and call onCreate() so maybe there is an issue with initializing UA and then processing a push straight afterwards?

"main" tid=1 Blocked 
"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x770c8000 self=0xb4427800
  | sysTid=3469 nice=0 cgrp=default sched=0/0 handle=0xb6f2bbec
  | state=S schedstat=( 0 0 0 ) utm=73 stm=41 core=2 HZ=100
  | stack=0xbe002000-0xbe004000 stackSize=8MB
  | held mutexes=
  at com.urbanairship.v.a (SourceFile:140)
- waiting to lock <0x2b46a6b3> (a java.lang.Object) held by thread 30
  at b.f.a (SourceFile:659)
  at b.f.a (SourceFile:647)
  at b.f.a (SourceFile:180)
  at b.f.b (SourceFile:131)
  at com.mobilefootie.wc2010.FotMobApp.onCreate (SourceFile:283)
  at android.app.Instrumentation.callApplicationOnCreate (Instrumentation.java:1035)
  at android.app.ActivityThread.handleBindApplication (ActivityThread.java:6029)
  at android.app.ActivityThread.access$1700 (ActivityThread.java:218)
  at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1795)
  at android.os.Handler.dispatchMessage (Handler.java:102)
  at android.os.Looper.loop (Looper.java:145)
  at android.app.ActivityThread.main (ActivityThread.java:6934)
  at java.lang.reflect.Method.invoke! (Native method)
  at java.lang.reflect.Method.invoke (Method.java:372)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:1404)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1199)
"Thread-5557" prio=5 tid=30 Native
  | group="main" sCount=1 dsCount=0 obj=0x1ac023c0 self=0xb4434c00
  | sysTid=3635 nice=0 cgrp=default sched=0/0 handle=0xb4519f80
  | state=S schedstat=( 0 0 0 ) utm=17 stm=13 core=3 HZ=100
  | stack=0xb053b000-0xb053d000 stackSize=1036KB
  | held mutexes=
  native: pc 00000000000133dc  /system/lib/libc.so (syscall+28)
  native: pc 00000000000acb9b  /system/lib/libart.so (_ZN3art17ConditionVariable4WaitEPNS_6ThreadE+82)
  native: pc 00000000001b9dc1  /system/lib/libart.so (_ZN3art3JNI11SetIntFieldEP7_JNIEnvP8_jobjectP9_jfieldIDi+544)
  native: pc 000000000007c5cd  /system/lib/libandroid_runtime.so (???)
  native: pc 000000000008bdef  /system/lib/libandroid_runtime.so (???)
  native: pc 000000000008ce23  /system/lib/libandroid_runtime.so (???)
  native: pc 0000000000a42b85  /system/framework/arm/boot.oat (Java_android_content_res_AssetManager_loadResourceValue__ISLandroid_util_TypedValue_2Z+128)
  at android.content.res.AssetManager.loadResourceValue (Native method)
  at android.content.res.AssetManager.getResourceText (AssetManager.java:162)
- locked <@addr=0x1ada71f0> (a android.content.res.AssetManager)
  at android.content.res.Resources.getText (Resources.java:1435)
  at android.content.res.Resources.getString (Resources.java:1536)
  at com.google.android.gms.common.zzo.isGooglePlayServicesAvailable (unavailable)
  at com.google.android.gms.common.zze.a (unavailable)
  at com.google.android.gms.common.GoogleApiAvailability.a (unavailable)
  at com.urbanairship.google.a.a (SourceFile:28)
  at com.urbanairship.google.b.b (SourceFile:98)
  at com.urbanairship.job.e$a.a (SourceFile:288)
  at com.urbanairship.job.e.a (SourceFile:163)
  at com.urbanairship.job.e.a (SourceFile:101)
  at com.urbanairship.d.e.b (SourceFile:110)
  at com.urbanairship.d.b.a (SourceFile:191)
  at com.urbanairship.v.D (SourceFile:612)
  at com.urbanairship.v.c (SourceFile:369)
- locked <0x2b46a6b3> (a java.lang.Object)
  at com.urbanairship.v.b (SourceFile:49)
  at com.urbanairship.v$2.run (SourceFile:329)
  at java.lang.Thread.run (Thread.java:818)
marc-scig commented 7 years ago

Thanks for the additional context, we'll see what we can make of it and hopefully find a resolution for you soon.

pdxmele commented 7 years ago

Our support team is tracking this (or a similar issue from you?) as ticket 39760, please continue your correspondence there and we will let you know of any possible resolution there.

sayedhassan commented 6 years ago

@cnordvik Did you manage to fix the issue ? We are facing the same issue with our app. over 90% ANR's caused by urbanairship ?

rlepinski commented 6 years ago

@sayedhassan Urban Airship is probably not the root cause of your ANRs. Usually its a deadlock somewhere else that is locking a resource that is prevent UA from taking off, so when you call shared() on the singleton it will cause the ANR. Once an ANR was happening because UUID generation was blocked for the entire app, it prevented UA from starting but UA did not cause the deadlock on that resource.

Please open a new ticket with your SDK version and a full ANR report so we can dig into other threads. If you are not already, please update to 8.9.6.