shorebirdtech / shorebird

Code Push for Flutter and other tools for Flutter businesses.
https://shorebird.dev
Other
1.97k stars 118 forks source link

fix: Application Not Responding at shorebird_init on some devices #1981

Closed canewsin closed 1 day ago

canewsin commented 2 weeks ago

App ID: cc6ea63a-aad2-4b11-98fc-bb03a0f5a794

Stacktrace

main (native):tid=1 systid=7511 
#00 pc 0x78dec libc.so (syscall + 28) (BuildId: cd953571180b7f5f8ae5570dad29595f)
#01 pc 0x85e3e8 split_config.arm64_v8a.apk + 8163328 (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)
#02 pc 0x60b03c split_config.arm64_v8a.apk + 8163328 (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)
#03 pc 0x601c78 split_config.arm64_v8a.apk + 8163328 (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)
#04 pc 0x5e6c94 split_config.arm64_v8a.apk (shorebird_init + 1516) (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)
#05 pc 0x892ab4 split_config.arm64_v8a.apk + 8163328 (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)
#06 pc 0x351e30 libart.so (art_quick_generic_jni_trampoline + 144) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#07 pc 0x5b980c libart.so (nterp_helper + 3852) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#08 pc 0x8a7414 base.vdex (io.flutter.embedding.engine.FlutterJNI.init + 300)
#09 pc 0x5b9f58 libart.so (nterp_helper + 5720) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#10 pc 0x8348a8 base.vdex (ac.f.h + 952)
#11 pc 0x5b9854 libart.so (nterp_helper + 3924) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#12 pc 0x8349e0 base.vdex (ac.f.p + 8)
#13 pc 0x5b9854 libart.so (nterp_helper + 3924) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
#14 pc 0x834458 base.vdex (ac.f.b)
#15 pc 0x5b8934 libart.so (nterp_helper + 52) (BuildId: 4801adadf1dae7e020ba05f3204efc9c)
       at io.flutter.embedding.engine.FlutterJNI.nativeInit(FlutterJNI.java)
       at io.flutter.embedding.engine.FlutterJNI.init(FlutterJNI.java:241)
       at io.flutter.embedding.engine.loader.FlutterLoader.ensureInitializationComplete(FlutterLoader.java:347)
       at io.flutter.embedding.engine.loader.FlutterLoader.lambda$ensureInitializationCompleteAsync$0(FlutterLoader.java:403)
       at android.os.Handler.handleCallback(Handler.java:942)
       at android.os.Handler.dispatchMessage(Handler.java:99)
       at android.os.Looper.loopOnce(Looper.java:240)
       at android.os.Looper.loop(Looper.java:351)
       at android.app.ActivityThread.main(ActivityThread.java:8377)
       at java.lang.reflect.Method.invoke(Native method)
       at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:584)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1013)

Steps To Reproduce

ANR triggered on some devices

Additional Context I see that some devices had an immediate ANR triggered 1 second apart in same session.

Few Device List

bryanoltman commented 2 weeks ago

This is certainly not what we want to happen. Are you able to share code that reproduces this issue, or to provide more information? There isn't a lot to go off of in this report.

eseidel commented 2 weeks ago

Definitely unexpected. We should be able to symbolicate the rest of the stack with our Android flutter build artifacts, similar to what we've wired up for iOS.

eseidel commented 2 weeks ago

Looks like the app_id is cc6ea63a-aad2-4b11-98fc-bb03a0f5a794.

eseidel commented 2 weeks ago

Looks like all recent versions were built with flutter aee5222c0d52012f089564a8395826d56e7c2fe8

eseidel commented 2 weeks ago

https://docs.shorebird.dev/guides/crash-reporting.

~I believe the symbols in question should be: https://download.shorebird.dev/flutter_infra_release/flutter/aee5222c0d52012f089564a8395826d56e7c2fe8/android-arm64-release/symbols.zip~

Symbols are at https://download.shorebird.dev/flutter_infra_release/flutter/e4434d7a212b29a8ee8ffaf8e2c453c753904933/android-arm64-release/symbols.zip

And ndk-stack or atos should be able to symbolicate.

canewsin commented 3 days ago

any updates? I'm thinking to remove shorebird to prevent these anrs

eseidel commented 2 days ago

Apologies for failing to follow up here earlier. @bryanoltman are you able to symbolicate the crash stack this afternoon? If not, let me know and I'm happy to help.

@canewsin since we don't yet know what's wrong (until we look at the stack in more detail), yes removing Shorebird is the only option at this time. We'll let you know as soon as we learn more (hopefully later today).

eseidel commented 2 days ago

Steps (as I understand them):

  1. Get the symbols (as above)
  2. Verify that the build_id for libflutter.so matches the one in the stack trace (confusingly called "split_config.arm64_v8a.apk").
  3. Use atos -o libflutter.so offset where the offset is the number after the +: #01 pc 0x85e3e8 split_config.arm64_v8a.apk + 8163328 (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)

I'm sure ndk-stack or whatever will do that for you automatically, but it's probably easier to symbolicate the 5 respective lines by hand. I supect we'll find that it's hanging down in the rust http stack.

eseidel commented 2 days ago

If the build_id does not match, then we haven't found the right symbols. There must be better docs for it than this, but this is what I found: https://github.com/mattst88/build-id

build_id is essentially the same as the uuid concept used in .dsym files on Mac. Just a different name for the elf vs. macho worlds.

eseidel commented 2 days ago

I was wrong. Turns out the NDK is in: ~/Library/Android/sdk/ndk/25.2.9519653/

And the bin is not atos anymore, but rather llvm-symbolizer. See also ndk-stack in the same folder.

eseidel commented 2 days ago

Looks like the modern command is: llvm-symbolizer --obj=libflutter.so 12345 https://llvm.org/docs/CommandGuide/llvm-symbolizer.html

eseidel commented 2 days ago

Looks like I'm also wrong about which numbers are which 🤣

So for:

#01 pc 0x85e3e8 split_config.arm64_v8a.apk + 8163328 (BuildId: eba38b52a6fc1ee5de8fc3e6555f2fb60dc3225f)

#01 refers to which stack frame this is (this is the first from from the top) pc 0x85e3e8 is telling you that 0x85e3e8 is the pc (program counter), point of execution for this frame. split_config.arm64_v8a.apk + 8163328 I don't really know what this is. You see in other lines this doesn't have a plus (but rather just a library name). I'm not sure how (or even if we can) map from this to an offset within libflutter.so.

8163328 is 0x7C9000 in hex, which is a nice round number and likely a library load offset? Except the PCs reported are really small (smaller than that offset in some cases) which would suggest they're not library load offsets. Unless the PCs are already subtracted relative to their load offset, but that would be surprising.

eseidel commented 2 days ago

The information typically needed to understand a stacktrace are just two things:

  1. the pc
  2. the library name/build_id + load offset pairs.

In this case we're missing the load offset pairs (typically at the bottom of a stack trace), but maybe it's embedded into those lines in ways I don't understand.

The symbol lookup process is then just taking the pc - load offset to get a file offset, and then looking up the symbol corresponding to that file offset in the associated file for that build_id/name

eseidel commented 2 days ago

@canewsin Curious, are you able to provide us a more complete stack trace? Typically they include a line: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** They should appear in adb logcat as well as saved on device in /data/tombstones. Probably also provided in the play store console?

canewsin commented 2 days ago

@eseidel Attaching latest stack traces from playstore

stacktrace_2.log stacktrace_1.log stacktrace.log

bryanoltman commented 2 days ago

Working down the stack:

⑆ ./llvm-addr2line -e ~/Downloads/symbols/libflutter.so 0x85e3e8 -i -f -C -s --verbose
std::sys::pal::unix::futex::futex_wait::hef01d709ae0798d1
  Filename: futex.rs
  Function start filename: futex.rs
  Function start line: 20
  Line: 62
  Column: 21
std::sys::pal::unix::locks::futex_mutex::Mutex::lock_contended::hd58bbf4660e458cd
  Filename: futex_mutex.rs
  Function start filename: futex_mutex.rs
  Function start line: 33
  Line: 56
  Column: 13

bryanoltman@moltman ~/Library/Android/sdk/ndk/26.1.10909125/toolchains/llvm/prebuilt/darwin-x86_64/bin
⑆ ./llvm-addr2line -e ~/Downloads/symbols/libflutter.so 0x60b03c -i -f -C -s --verbose
updater::config::set_config::h0b14a9d6ed3bed3d
  Filename: updater.1870c0a65ae9aed2-cgu.09
  Line: 0
  Column: 0

bryanoltman@moltman ~/Library/Android/sdk/ndk/26.1.10909125/toolchains/llvm/prebuilt/darwin-x86_64/bin
⑆ ./llvm-addr2line -e ~/Downloads/symbols/libflutter.so 0x601c78 -i -f -C -s --verbose
updater::updater::init::hf447e29eb516d174
  Filename: updater.1870c0a65ae9aed2-cgu.06
  Line: 0
  Column: 0

bryanoltman@moltman ~/Library/Android/sdk/ndk/26.1.10909125/toolchains/llvm/prebuilt/darwin-x86_64/bin
⑆ ./llvm-addr2line -e ~/Downloads/symbols/libflutter.so 0x5e6c94 -i -f -C -s --verbose
shorebird_init
  Filename: ??
  Line: 0
  Column: 0

bryanoltman@moltman ~/Library/Android/sdk/ndk/26.1.10909125/toolchains/llvm/prebuilt/darwin-x86_64/bin
⑆ ./llvm-addr2line -e ~/Downloads/symbols/libflutter.so 0x892ab4 -i -f -C -s --verbose
flutter::ConfigureShorebird(std::_fl::basic_string<char, std::_fl::char_traits<char>, std::_fl::allocator<char>>, std::_fl::basic_string<char, std::_fl::char_traits<char>, std::_fl::allocator<char>>, flutter::Settings&, std::_fl::basic_string<char, std::_fl::char_traits<char>, std::_fl::allocator<char>> const&, std::_fl::basic_string<char, std::_fl::char_traits<char>, std::_fl::allocator<char>> const&, std::_fl::basic_string<char, std::_fl::char_traits<char>, std::_fl::allocator<char>> const&)
  Filename: shorebird.cc
  Function start filename: shorebird.cc
  Function start line: 83
  Line: 128
  Column: 5
flutter::FlutterMain::Init(_JNIEnv*, _jclass*, _jobject*, _jobjectArray*, _jstring*, _jstring*, _jstring*, _jstring*, _jstring*, _jstring*, long)
  Filename: flutter_main.cc
  Function start filename: flutter_main.cc
  Function start line: 81
  Line: 135
  Column: 3
eseidel commented 2 days ago

This is super helpful @canewsin! Looks like we might have a deadlock in our updater code. Looking now.

eseidel commented 2 days ago

We understand the bug.

The sequence is:

  1. shorebird_init is called.
  2. shorebird_check_for_update is called. shorebird_check_for_update grabs the config lock (and holds it) while making a network call to check for the update (this is a bug).
  3. while that network call is outstanding shorebird_init is called from another thread. shorebird_init doesn't have a fast-path to exit out if already inited w/o grabbing the config lock (another bug). shorebird_init now hangs waiting on the network call to return on the other thread (and thus allow check_for_update to release the config lock).

We could/should probably fix both bugs (although fixing either should fix this).

@bryanoltman is working on testing and fixing now. We'll likely release a fix on Weds with the expected Flutter update.

eseidel commented 2 days ago

My guess (although I don't know) of why you're seeing this and others are not is that you presumably are using either multiple Flutter engines, or using Firebase Cloud Messaging (or another service that wakes up the Flutter engine in the background), thus shorebird_init has already been called before the real "launch" of the app, and it just so happens that sometimes these phones are on slow networks and also happen to be making a background call to check_for_update while the launch is happening, and this shorebird_init call hangs.

If you're using package:shorebird_code_push to trigger an update check manually, that may be contributing to the frequency of the ANRs, but the real bug is ours and we'll have a fix soon.

We don't yet have the capacity to back-port fixes to older flutter revisions, with a new Flutter release expected this week, we expect this will only be fixed in 3.22.0 and later once we release it.

eseidel commented 2 days ago

Just wanted to thank you again for reporting @canewsin. I suspect we've had this bug for over a year now and you're the first to report it! Regardless, Bryan has a local fix now which should go out later this week.

bryanoltman commented 2 days ago

@canewsin Just to confirm/debunk a theory: are you using package:shorebird_code_push?

canewsin commented 2 days ago

@eseidel @bryanoltman yup

shorebird_code_push: 1.1.3

eseidel commented 1 day ago

Looks like Flutter 3.22 was released this morning, so I expect our Flutter 3.22 (with this fix) will be out later today.