objectbox / objectbox-dart

Flutter database for super-fast Dart object persistence
https://docs.objectbox.io/getting-started
Apache License 2.0
927 stars 115 forks source link

Many crashes on Android 10 & android 11 #237

Closed Ezaldeen99 closed 2 years ago

Ezaldeen99 commented 3 years ago

I am facing many crashes only after I upload the app as a bundle to google play,

Basic info

we have 187 users affected by this crash, I am not really sure what is the reason for this crash, would love if anybody will help us with this one. Thanks a lot.

Logs, stack trace

backtrace:
  #00  pc 000000000004f064  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164)
  #00  pc 0000000000114674  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000)
  #00  pc 0000000000111104  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000)
  #00  pc 00000000000b7cb0  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_once+140)
  #00  pc 0000000000111024  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000) (__cxa_get_globals+28)
  #00  pc 00000000001109b0  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000) (__cxa_throw+28)
  #00  pc 000000000007423c  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000)
  #00  pc 0000000000073934  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000)
  #00  pc 0000000000072504  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000)
  #00  pc 000000000006d2f8  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk!libobjectbox-jni.so (offset 0x1397000) (obx_store_open+56)
  #00  pc 0000000000532a20  /data/app/~~rxWDdiuDBYYe7ogrr-dcJQ==/-d_k74-HMCMsJVgZA0SRyPA==/split_config.arm64_v8a.apk (offset 0x10000) 

DATABASE TASK

typedef FutureFunction<E, R> = Future<R> Function(Box<E>);

///the main goal of using this tasker is not to keep boxes open with
/// huge data which consumes more memory
class DBTask<E, R> {
  DBTask({FutureFunction<E, R> onLoaded, bool shouldClose = true})
      : _onLoaded = onLoaded,
        _shouldClose = shouldClose {
    _currentTasks++;
    _initTaskCheckers();
  }
  final AppLogger logger = AppLogger(tag: 'DBTask');

  Store _store;
  static const int _maxRetries = 3;
  int _retryCount = 0;
  static int _currentTasks = 0;
  final FutureFunction<E, R> _onLoaded;
  final bool _shouldClose;
  final Completer<R> _completer = Completer<R>();

  ///opens hive box
  ///runs your onLoad function
  ///waits until all work is done
  ///closes the box
  Future<R> run() {
    _preLoadChecker();
    return _completer.future;
  }

  void _initTaskCheckers() async {}

  void _preLoadChecker() async {
    final bool storeReady = await _openStore();
    if (storeReady == true) {
      _run();
    } else if (_maxRetries != _retryCount) {
      await Future.delayed(const Duration(milliseconds: 200));
      _preLoadChecker();
      _retryCount++;
    } else {
      _completer.completeError('HiveTask<$runtimeType> : error timeout');
    }
  }

  void _close() {
    _currentTasks = max(_currentTasks--, 0);
    if (_currentTasks == 0 && _shouldClose) {
      _store?.close(); // don't forget to close the store
    }
    logger.debug('box closed');
  }

  void _run() async {
    try {
      final box = _store.box<E>();
      final R val = await _onLoaded(box);
      _completer.complete(val);
    } catch (e) {
      _completer.completeError('Task _run error $e');
    } finally {
      _close();
    }
  }

  Future<bool> _openStore() async {
    if (_store != null) {
      return null;
    }
    final Completer<bool> storeCompleter = Completer();
    await getApplicationDocumentsDirectory().then((Directory dir) {
      _store = Store(getObjectBoxModel(), directory: '${dir.path}/objectbox');
      storeCompleter.complete(true);
    }).catchError((x) => storeCompleter.complete(false));
    return storeCompleter.future;
  }
}
vaind commented 3 years ago

Only had a short look and didn't try to reproduce yet (because from what you're saying, it doesn't seem very reproducible...) so for now, just observations:

  1. seems like you're migrating from hive -> the main goal of using this tasker is not to keep boxes open with huge data which consumes more memory - this doesn't apply to ObjectBox, it doesn't cache the database in memory.
  2. opening and closing store for each "DBTask", assuming it's a small one and there are many of those, while generally fast, is quite inefficient - ideally, you only open a store once.
vaind commented 3 years ago

Also, have you tried updating to the latest version (0.14.0) - should still work even if your project isn't migrated to null-safety yet?

Ezaldeen99 commented 3 years ago

@vaind I will update the library and try again, I just felt lost because it's kind of hard to reproduce the same crash on my device.

I will try to recheck the DBTask again and come back to you.

thanks for your support

greenrobot commented 3 years ago

To get back shortly to that native crash stack... It really looks weird. Seems like an exception is thrown (natively) while opening the store. For unknown reasons, this does get not caught by our exception handlers but causes an abort... We haven't ever seen this with our "plain" Android lib ever.

Ezaldeen99 commented 3 years ago

@greenrobot I tried to upload app bundles, split apks, and large size apk because I thought maybe split is making this native crash, however, all my tries to reduce the crash numbers didn't help. Is there any other information I can provide to help to track this issue?.

can I prevent the abort by handle all types of crashes in your native android lib?

vaind commented 3 years ago

A wild guess: could memory pressure be a cause of such an issue? Some allocation issues that couldn't be handled on the native side?

Ezaldeen99 commented 3 years ago

@vaind I don't think so, the crash also happens on note +10 which has 8GB of RAM , so if it's a memory issue I would notice it on my 4 GB emulator or 70A device

greenrobot commented 3 years ago

Any chance that you could reproduce this (from Play Store) and check the logs (logcat)? Often there's interesting info before an exception is thrown.

Ezaldeen99 commented 3 years ago

@greenrobot I am keep testing everyday, I will send an update here soon if I could reproduce the same crash

Ezaldeen99 commented 3 years ago

@greenrobot @vaind I was able to reproduce this error again today on my emulator

this is the current log

E/Box     (17860): Storage error "Try again" (code 11)
F/libc    (17860): /usr/local/google/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "cannot create thread specific key for __cxa_get_globals()" failed
F/libc    (17860): Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 17908 (1.ui), pid 17860 (msa.shoofcinema)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/sdk_gphone_x86/generic_x86:10/QSR1.190920.001/5891938:user/release-keys'
Revision: '0'
ABI: 'x86'
Timestamp: 2021-05-17 21:18:35+0300
pid: 17860, tid: 17908, name: 1.ui  >>> com.appchief.msa.shoofcinema <<<
uid: 10139
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
Abort message: '/usr/local/google/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "cannot create thread specific key for __cxa_get_globals()" failed'
    eax 00000000  ebx 000045c4  ecx 000045f4  edx 00000006
    edi f530833e  esi c7edddb0
    ebp f728dad0  esp c7eddd58  eip f728dad9
backtrace:
      #00 pc 00000ad9  [vdso] (__kernel_vsyscall+9)
      #01 pc 00092328  /apex/com.android.runtime/lib/bionic/libc.so (syscall+40) (BuildId: 76290498408016ad14f4b98c3ab6c65c)
      #02 pc 000ad651  /apex/com.android.runtime/lib/bionic/libc.so (abort+193) (BuildId: 76290498408016ad14f4b98c3ab6c65c)
      #03 pc 000adb88  /apex/com.android.runtime/lib/bionic/libc.so (__assert2+56) (BuildId: 76290498408016ad14f4b98c3ab6c65c)
      #04 pc 00148c44  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #05 pc 00145095  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #06 pc 0011c4b0  /apex/com.android.runtime/lib/bionic/libc.so (pthread_once+96) (BuildId: 76290498408016ad14f4b98c3ab6c65c)
      #07 pc 00144f6e  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (__cxa_get_globals+46) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #08 pc 0014480f  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (__cxa_throw+31) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #09 pc 00077c4b  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #10 pc 00077084  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #11 pc 00075936  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #12 pc 0006eb9b  /data/app/com.appchief.msa.shoofcinema-0bDT6zX6GmS35dc2Di32Aw==/base.apk!libobjectbox-jni.so (offset 0x665000) (obx_store_open+75) (BuildId: 219c52eedc0e4bea2dce69db74a4611e1f2a270f)
      #13 pc 000033d7  <anonymous:c6c80000>
greenrobot commented 3 years ago

1) Great that you were able to reproduce! Important question here: is it always reproducible or did you have to try many times?

2) Also, second question: before E/Box (17860): Storage error "Try again" (code 11) is nothing? Usually we print some more context information.

3) Do you have any special Android setup / dependencies / configuration?

Error no. 11 is also known as EAGAIN ("Resource temporarily unavailable").

While trying to raise the exception, bionic seems to run into a shortage of thread specific keys... Maybe that's also the initial error.

We'll also dig into this some more...

Ezaldeen99 commented 3 years ago

@greenrobot

1- it can not be reproduced always, but I will explain how this happened:

There is a player that saves its playing time each 1 min to the database. My teammate used the database helper above to open the box and close it each one min and each time the user closes the player.

for now, I changed the database helper and avoid opening and closing the Store each time, and opened it only once. the reason that made me think like this is this stress test results, I am not sure if I am right.

the current db helper:


FutureFunction<E, R> = Future<R> Function(Box<E>);

///the main goal of using this tasker is not to keep boxes open with
/// huge data which consumes more memory
class DBTask<E, R> {
  DBTask({FutureFunction<E, R> onLoaded, bool shouldClose = true})
      : _onLoaded = onLoaded,
        _shouldClose = shouldClose {
    // _currentTasks++;
    _initTaskCheckers();
  }

  final AppLogger logger = AppLogger(tag: 'DBTask');

  Store _store;
  static const int _maxRetries = 3;
  int _retryCount = 0;
  // static int _currentTasks = 0;
  final FutureFunction<E, R> _onLoaded;
  final bool _shouldClose;
  final Completer<R> _completer = Completer<R>();

  ///opens hive box
  ///runs your onLoad function
  ///waits until all work is done
  ///closes the box
  Future<R> run() {
    _preLoadChecker();
    return _completer.future;
  }

  void _initTaskCheckers() async {}

  void _preLoadChecker() async {
    final bool storeReady = await _openStore();
    if (storeReady == true) {
      _run();
    } else if (_maxRetries != _retryCount) {
      await Future.delayed(const Duration(milliseconds: 200));
      _preLoadChecker();
      _retryCount++;
    } else {
      _completer.completeError('HiveTask<$runtimeType> : error timeout');
    }
  }

  // void _close() {
  //   _currentTasks = max(_currentTasks--, 0);
  //   if (_currentTasks == 0 && _shouldClose) {
  //     _store?.close(); // don't forget to close the store
  //   }
  //   // logger.debug('box closed');
  // }

  void dispose() {
    _store?.close();
  }

  void _run() async {
    try {
      final box = _store.box<E>();
      final R val = await _onLoaded(box);
      _completer.complete(val);
    } catch (e) {
      _completer.completeError('Task _run error $e');
    }
  }

  Future<bool> _openStore() async {
    final Completer<bool> storeCompleter = Completer();

    if (_store != null) {
      storeCompleter.complete(true);
      return storeCompleter.future;
    }

    try {
      await getApplicationDocumentsDirectory().then((Directory dir) {
        if (dir?.path != null) {
          _store =
              Store(getObjectBoxModel(), directory: '${dir.path}/objectbox');
          storeCompleter.complete(true);
        } else {
          storeCompleter.complete(false);
        }
      }).catchError((x) => storeCompleter.complete(false));
      return storeCompleter.future;
    } catch (ignore) {
      return storeCompleter.future;
    }
  }
}

2- I don't think there are any logs, but I will retry to make a stress test opening and closing the store many times and I hope to reproduce it and check again.

3- the only custom thing I saw in this project is the gesture sample they copied from here

Thanks for your support and I will try to dig in it more

greenrobot commented 3 years ago

for now, I changed the database helper and avoid opening and closing the Store each time, and opened it only once

Yes, please keep a single store instance open for your app. There's no need to close it. When your app is shut down by the system, ObjectBox store is also closed properly.

vaind commented 3 years ago

Is it possible that you have many of these tasks active at the same time - leading to many open stores, thus causing these out-of-resources errors. By "active", I mean after _run() is called and before dispose().

And I'll repeat myself: please use a single store and then see if this still happens. A rough idea of how you could do it with your architecture, for example you can use a global variable _store:

  late final Future<Store> _store = _openStore()

  Future<Store> _openStore() async => Future.sync(() async {
    final dir = await getApplicationDocumentsDirectory();
    return Future.value(Store(getObjectBoxModel(), directory: '${dir.path}/objectbox'));
  }

Then in your _run function, just wait for the store to be open:

      final box = (await _store).box<E>();
vaind commented 2 years ago

We have a new branch that adds a check so that a single store is only opened once, you can try it with your app by switching the objectbox dependency in pubspec.yaml to the following:

dependencies:
  objectbox:
    git:
      url: git@github.com:objectbox/objectbox-dart.git
      ref: single-store-per-dir

Store constructor now throws if you try to construct multiple store instances for the same database (except for using "fromReference()", which works rather differently)

Ezaldeen99 commented 2 years ago

Thank you guys for your time and your effort, all your solutions and suggestions were very accurate:

I can confirm that the cause of this crash was opening the store many times. I will upload this version to production and if anything happened I will tell you guys.

I have few things to share with you:

1- error logs are ambiguous, maybe you can print more developer-friendly crash logs as I checked again many times, it only shows Storage error and nothing more about it.

2- Maybe you can add warning logs also or prevent developers from doing this, or adding a warning of doing this on your documentation.

3- right now I can't use any null-safe libraries, the project is very old and it will take time to migrate to null-safe libraries so as a temp. fix I just moved store to be a global variable and init it only once then re-use it again, is that ok?

vaind commented 2 years ago

1 error logs

We'll see if we can do sth about that

2- Maybe you can add warning logs also or prevent developers from doing this, or adding a warning of doing this on your documentation.

That's kinda what the branch is about - you won't be able to open multiple store instances of the same dir - throws an UnsupportedError now (with the next release, v1.1 I guess)

3 ... I just moved store to be a global variable and init it only once then re-use it again, is that ok?

Yeah, for example store ??= Store(...) would work just fine as well

vaind commented 2 years ago

Thanks for the follow-up, I'm closing the issue now. Feel free to reopen/comment if this is still an issue.