Closed raja-baz closed 7 years ago
We also have other ANRs that happen in getInstance()
. Some of them have different traces in native. Will include another one here which we also get a lot of(maybe related, maybe not, I can't tell):
This one we get ~900 of per week. The stack trace on our end of the app varies, but once we hit io.realm.internal it's always the same. Also, the realm being queried here is encrypted
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x7532a578 self=0xb7b576f8
| sysTid=20429 nice=0 cgrp=default sched=0/0 handle=0xb6f9dde4
| state=S schedstat=( 665686472 47291765 1057 ) utm=30 stm=36 core=3 HZ=100
| stack=0xbe219000-0xbe21b000 stackSize=8MB
| held mutexes=
native: pc 00000000000169f8 /system/lib/libc.so (syscall+28)
native: pc 00000000000f5ecd /system/lib/libart.so (_ZN3art17ConditionVariable4WaitEPNS_6ThreadE+80)
native: pc 00000000002867ab /system/lib/libart.so (_ZN3art3JNI16NewWeakGlobalRefEP7_JNIEnvP8_jobject+602)
native: pc 000000000005f9d3 /data/app/com.anghami-2/lib/arm/librealm-jni.so (???)
native: pc 0000000000030609 /data/app/com.anghami-2/lib/arm/librealm-jni.so (Java_io_realm_internal_SharedRealm_nativeGetSharedRealm+288)
native: pc 00000000009d0cb1 /data/app/com.anghami-2/oat/arm/base.odex (Java_io_realm_internal_SharedRealm_nativeGetSharedRealm__JLio_realm_internal_RealmNotifier_2+116)
at io.realm.internal.SharedRealm.nativeGetSharedRealm (SharedRealm.java)
at io.realm.internal.SharedRealm.<init> (SharedRealm.java:180)
at io.realm.internal.SharedRealm.<init> (SharedRealm.java:180)
at io.realm.internal.SharedRealm.getInstance (SharedRealm.java:227)
at io.realm.internal.SharedRealm.getInstance (SharedRealm.java:217)
at io.realm.RealmCache.doCreateRealmOrGetFromCache (RealmCache.java:317)
- locked <0x05a1c407> (a io.realm.ar)
at io.realm.RealmCache.createRealmOrGetFromCache (RealmCache.java:281)
at io.realm.Realm.getInstance (Realm.java:346)
What is
at com.anghami.AnghamiApplication.onCreate (AnghamiApplication.java:64)
and
at com.anghami.data.local.FollowedItems.<init> (FollowedItems.java:222)
at com.anghami.data.local.FollowedItems.initialize (FollowedItems.java:184)
and
at com.anghami.data.local.RealmHelper._getInstance (RealmHelper.java:205)
at com.anghami.data.local.RealmHelper.getInstance (RealmHelper.java:188)
at com.anghami.data.local.RealmHelper.getLongLivedCacheRealm (RealmHelper.java:78)
?
AnghamiApplication:
FollowedItems.initialize();
That's a singleton. initialize
calls the constructor which gets an instance of realm to use for later:
FollowedItems(line 222)
mCacheRealm = RealmHelper.getLongLivedCacheRealm(listener);
The code in RealmHelper looks like this:
74 public static RealmWrapper getLongLivedCacheRealm(@Nonnull RealmWrapper.Listener listener) {
75 if (Looper.myLooper() == null) {
76 throw new IllegalStateException("Long lived realms are only valid on threads with loopers");
77 }
78 return new RealmWrapper(getInstance(RealmConfig.sCacheRealmConfig), listener);
79 }
Which calls through to getInstance
with a specific configuration:
This is the relevant part of the top-level getInstance()
:
186 private synchronized static Realm getInstance(RealmConfiguration configuration) {
187 try {
188 return _getInstance(configuration);
189 } catch (RuntimeException e) {
And _getInstance
201 private static Realm _getInstance(RealmConfiguration configuration) {
202 if (configuration == null) {
203 return Realm.getDefaultInstance();
204 }
205 return Realm.getInstance(configuration);
206 }
Basically, we're initialising a singleton which holds on to a long-lived Realm instance(which it will be adding listeners to). There's some wrapping calls between that and Realm.getInstance()
but really nothing special. The thread hangs in getInstance()
on the Realm side.
I'm actually not sure what's up, but I don't think the getInstance
call needs to be synchronized. Realm is thread-local anyways and handles synchronization internally in the RealmCache.
I do not know if this affects anything, though.
I know it doesn't need to be, we do that for other reasons I've posted about elsewhere in issues. Basically we've had crashes due to corrupt realm files and the way we're handling that until we can figure out the corruption issue is by restoring from backups we make periodically (you can see the try/catch there) for that to work, all instances need to be closed. So we synchronize getInstance and send out a message to consumers to close their instances before proceeding with the restore.
This doesn't come into play here though for 2 reasons:
On Thu, Oct 5, 2017 at 3:15 PM Gabor Varadi notifications@github.com wrote:
I'm actually not sure what's up, but I don't think the getInstance call needs to be synchronized. Realm is thread-local anyways and handles synchronization internally.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/realm/realm-java/issues/5386#issuecomment-334446733, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNP7ke1R5Tsean1qlVqY7TgGP4nG7nIks5spMhfgaJpZM4Pu6ac .
It's hard to tell without seeing the entire code, but judging from the ANR stack it might be a deadlock situation. You can see that the main thread is holding a lock
...
locked <0x07c608e4> (a io.realm.ar)
...
locked <0x0a5f714d>
Another thread may be waiting to obtain this lock (via an intrinsic/synchronize lock) you might see it in the trace of another thread by
waiting to lock ...
is this the entire ANR trace? have you tried to profile the code to see if the ANR is due to a long init or a deadlock?
@nhachicha :
Long init: I seriously doubt it, This is just a call to Realm.getInstance()
which should be really fast(and definitely not the several seconds it would take to trigger an ANR). Also, the trace inside the library(on the native end) is the exact same one for all reports. If it were long init you would expect it to land differently depending on the specific timing of when it got killed after taking too long. But just to be thorough, what would cause the init to take a long time? Lots of models(we don't have that many...)?
Deadlock from the non-native side of things:
locked <0x07c608e4> (a io.realm.ar)
This is this(RealmCache:284
):
private synchronized <E extends BaseRealm> E doCreateRealmOrGetFromCache(RealmConfiguration configuration,
Class<E> realmClass) {
And the other one:
locked <0x0a5f714d>
Is this in our own code(RealmHelper:186
):
private synchronized static Realm getInstance(RealmConfiguration configuration) {
The only way those particular locks would deadlock is if the following scenario occurred:
RealmHelper.getInstance()
or RealmCache.doCreateOrGetFromCache
on the java end.I find that highly unlikely. Also, as I mentioned above, we can discard the RealmHelper.getInstance()
lock as a candidate because we have this same ANR from before we introduced that one.
In any case, I've attached the 2 full ANR reports which includes the states of all the threads. You can have a look at that if that helps.
Another thing to notice here. The final call in the native stack trace is to /system/lib/libc.so (msync+12)
Which looks like something to do mmap()
? Maybe deadlocking with the garbage collector or something related?
Would it be possible to symbolise the native calls in librealmjni.so
so we know that that code's doing?
call stack with symbols:
realm::MakeLinkVector::consume(unsigned int)
realm::util::msync(void*, unsigned int)
realm::SlabAlloc::attach_file(std::string const&, realm::SlabAlloc::Config&)
realm::SharedGroup::do_open(std::string const&, bool, bool, realm::SharedGroupOptions)
realm::SharedGroup::compact()
realm::Realm::Realm(realm::Realm::Config, std::shared_ptr<realm::_impl::RealmCoordinator>)
std::__shared_ptr<realm::Realm::make_shared_realm(realm::Realm::Config, std::shared_ptr<realm::_impl::RealmCoordinator>)::make_shared_enabler, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<realm::Realm::make_shared_realm(realm::Realm::Config, std::shared_ptr<realm::_impl::RealmCoordinator>)::make_shared_enabler>, realm::Realm::Config, std::shared_ptr<realm::_impl::RealmCoordinator> >(std::_Sp_make_shared_tag, std::allocator<realm::Realm::make_shared_realm(realm::Realm::Config, std::shared_ptr<realm::_impl::RealmCoordinator>)::make_shared_enabler> const&, realm::Realm::Config&&, std::shared_ptr<realm::_impl::RealmCoordinator>&&)
realm::_impl::RealmCoordinator::get_realm(realm::Realm::Config)
realm::Realm::get_shared_realm(realm::Realm::Config)
Java_io_realm_internal_SharedRealm_nativeGetSharedRealm
are you calling compact
?
Maybe compactOnLaunch()
parameter in RealmConfiguration?
We have 3 realms we use. Only 1 of them has compactOnLaunch(), I'll have a look at the stacktrace from a larger sample of reports and see if all of them are hitting that Realm
@Zhuinden @nhachicha Though one thing right off the bat, the second stack trace(in the first comment) is from a different group of ANR reports and that one is on a realm that doesn't have compactOnLaunch()
though it does have encryption turned on
Can we get symbols for that as well?
So a lot of the ANRs do seem to be the specific realm with compactOnLaunch()
. I'm going to remove that for now to hopefully reduce the number of ANRs. As for the others, that same group of ANRs seems to include cases that happen when committing a transaction:
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x740a82a0 self=0xb4c76500
| sysTid=4318 nice=0 cgrp=bg_non_interactive sched=0/0 handle=0xb6f32b34
| state=D schedstat=( 594170122 262049630 907 ) utm=47 stm=12 core=3 HZ=100
| stack=0xbe6b5000-0xbe6b7000 stackSize=8MB
| held mutexes=
native: pc 00000000000417ec /system/lib/libc.so (msync+12)
native: pc 00000000000ef1bd /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000f4ef1 /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000f5d75 /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000f396f /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000f3b35 /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000f4c65 /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000827cb /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 00000000000762b1 /data/app/com.anghami-1/lib/arm/librealm-jni.so (???)
native: pc 000000000002f22f /data/app/com.anghami-1/lib/arm/librealm-jni.so (Java_io_realm_internal_SharedRealm_nativeCommitTransaction+22)
native: pc 000000000061b87d /data/app/com.anghami-1/oat/arm/base.odex (Java_io_realm_internal_SharedRealm_nativeCommitTransaction__J+80)
at io.realm.internal.SharedRealm.nativeCommitTransaction (SharedRealm.java)
at io.realm.internal.SharedRealm.commitTransaction (SharedRealm.java:237)
at io.realm.BaseRealm.commitTransaction (BaseRealm.java:400)
at io.realm.Realm.commitTransaction (Realm.java:132)
at io.realm.Realm.executeTransaction (Realm.java:1350)
at com.anghami.data.local.RealmHelper.executeTransaction (RealmHelper.java:102)
at com.anghami.data.local.RealmHelper.executeTransaction (RealmHelper.java:87)
at com.anghami.player.cache.CacheItem.init (CacheItem.java:54)
at com.anghami.player.cache.CacheItem.<init> (CacheItem.java:49)
at com.anghami.player.cache.OdinCache.get (OdinCache.java:50)
at com.anghami.player.core.SimpleAnghamiPlayer.buildCacheDataSourceFactory (SimpleAnghamiPlayer.java:530)
at com.anghami.player.core.SimpleAnghamiPlayer.buildMediaSource (SimpleAnghamiPlayer.java:500)
at com.anghami.player.core.SimpleAnghamiPlayer.prepare (SimpleAnghamiPlayer.java:331)
at com.anghami.player.core.OdinPlayer.refreshPlayers (OdinPlayer.java:319)
at com.anghami.player.core.OdinPlayer.<init> (OdinPlayer.java:184)
at com.anghami.player.core.OdinPlayer.getSharedInstance (OdinPlayer.java:200)
at com.anghami.AnghamiApplication.onCreate (AnghamiApplication.java:97)
at android.app.Instrumentation.callApplicationOnCreate (Instrumentation.java:1014)
at android.app.ActivityThread.handleBindApplication (ActivityThread.java:4806)
at android.app.ActivityThread.access$1600 (ActivityThread.java:154)
at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1452)
at android.os.Handler.dispatchMessage (Handler.java:102)
at android.os.Looper.loop (Looper.java:234)
at android.app.ActivityThread.main (ActivityThread.java:5526)
at java.lang.reflect.Method.invoke! (Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:726)
at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:616)
That trace seems to end up at the same place in the end, but is coming from a very different stack. I've attached the fully report as well. Note that the realm in question here does not have either compacting or encryption
here are the symbols for the stack trace without compact or encryption (anr_3
)
realm::Compare<realm::Equal, realm::StringData, realm::Subexpr, realm::Subexpr>::find_first(unsigned int, unsigned int) const
realm::util::msync(void*, unsigned int)
realm::GroupWriter::get_free_space()
realm::GroupWriter::commit(unsigned int)
realm::SharedGroup::low_level_commit(unsigned long long)
realm::SharedGroup::do_commit()
realm::SharedGroup::commit_and_continue_as_read()
realm::_impl::RealmCoordinator::commit_write(realm::Realm&)
realm::Realm::commit_transaction()
Java_io_realm_internal_SharedRealm_nativeCommitTransaction
Looks like the ANR happens when trying to memory map (msync
)...
According to the stack, you're committing a transaction on the main thread? is this commit still related to getInstance
? you can work around this by doing the expensive initialization on a worker thread using getInstanceAsync
Let us know how it did behave without the compactOnLaunch
This may also be related to https://github.com/realm/realm-java/issues/5391 since the OP is reporting delays when using getDefaultInstance
on his model
Closing due to no response.
Maybe migration?
Hello Guys, I'm still faceing this issue,
at io.realm.internal.OsSharedRealm.nativeGetSharedRealm (OsSharedRealm.java) at io.realm.internal.OsSharedRealm.<init> (OsSharedRealm.java:172) at io.realm.internal.OsSharedRealm.<init> (OsSharedRealm.java:172) at io.realm.internal.OsSharedRealm.getInstance (OsSharedRealm.java:219) at io.realm.BaseRealm.<init> (BaseRealm.java:124) at io.realm.BaseRealm.<init> (BaseRealm.java:124) at io.realm.BaseRealm.<init> (BaseRealm.java:93) at io.realm.BaseRealm.<init> (BaseRealm.java:93) at io.realm.BaseRealm.<init> (BaseRealm.java:93) at io.realm.Realm.<init> (Realm.java:153) at io.realm.Realm.createInstance (Realm.java:424) at io.realm.RealmCache.doCreateRealmOrGetFromCache (RealmCache.java:342) at io.realm.RealmCache.createRealmOrGetFromCache (RealmCache.java:282) at io.realm.Realm.getInstance (Realm.java:353) at com.opensooq.OpenSooq.util.RealmUtil.getRealm (RealmUtil.java:28) at com.opensooq.OpenSooq.config.dataSource.CountryLocalDataSource.getRealm (CountryLocalDataSource.java:60) at com.opensooq.OpenSooq.config.dataSource.CountryLocalDataSource.getMyCountry (CountryLocalDataSource.java:164) at com.opensooq.OpenSooq.util.CrashlyticsUtil.setupUserInfo (CrashlyticsUtil.java:38) at com.opensooq.OpenSooq.App.setupCrashlytics (App.java:173) at com.opensooq.OpenSooq.App.onCreate (App.java:95) at android.app.Instrumentation.callApplicationOnCreate (Instrumentation.java:1017) at android.app.ActivityThread.handleBindApplication (ActivityThread.java:4556) at android.app.ActivityThread.access$1400 (ActivityThread.java:150) at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1340) at android.os.Handler.dispatchMessage (Handler.java:99) at android.os.Looper.loop (Looper.java:137) at android.app.ActivityThread.main (ActivityThread.java:5283) at java.lang.reflect.Method.invokeNative (Method.java) at java.lang.reflect.Method.invoke (Method.java:511) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:1102) at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:869) at dalvik.system.NativeStart.main (NativeStart.java))
Realm version(s): 4.2.0 Realm sync feature enabled: no Android Studio version: 3.0.1
@cmelchior @beeender @nhachicha Tell me guys if you need any more info
regarding https://user-images.githubusercontent.com/4534097/39128803-67490d96-4711-11e8-9860-fb41bbcac665.png
I don't think this is a good pattern, since you return country
which is a managed object then close it's Realm ....
this might work since the Realm is not really closed until the refcount drops to 0, but you shouldn't rely on this side effect ...
@nhachicha he retrieves RealmCountry and then creates Country.
This should work for non-Sync Realms.
Why ANR is happening?
Probably you have a migration that has transform()
on many items
Expected Results
Get a realm instance :)
Actual Results
ANR. This comes from play store ANRs(on ~20k daily actives, we're getting this one about 1000 times a week, which is quite high):
Steps & Code to Reproduce
Can't reproduce easily as this comes from play store reports.
Version of Realm and tooling
Realm version(s): 3.7.2
Realm sync feature enabled: no
Which Android version and device: All android versions we support(5 through 8) have reports for this issue and many devices