realm / realm-core

Core database component for the Realm Mobile Database SDKs
https://realm.io
Apache License 2.0
1.01k stars 155 forks source link

Crash in realm::SlabAlloc::attach_file #7740

Closed nielsenko closed 2 months ago

nielsenko commented 3 months ago

SDK and version

SDK : Dart Version: 2.3.0-12-g25c79676 (using realm-core v14.7.0)

Observations

Crash log / stacktrace

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: SIGNAL 6 Abort trap: 6
Terminating Process: Runner [39693]

Triggered by Thread:  82

...

Thread 82 Crashed:
Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: SIGNAL 6 Abort trap: 6
Terminating Process: Runner [39693]

Triggered by Thread:  82

...

Thread 82 Crashed:
0   libsystem_kernel.dylib                 0x104f293b0 __pthread_kill + 8
1   libsystem_pthread.dylib                0x1050f3124 pthread_kill + 256
2   libsystem_c.dylib                      0x1801655c0 abort + 104
3   libc++abi.dylib                        0x1802a7778 abort_message + 128
4   libc++abi.dylib                        0x180298eb0 demangling_terminate_handler() + 272
5   libobjc.A.dylib                        0x1800634a8 _objc_terminate() + 140
6   libc++abi.dylib                        0x1802a6c50 std::__terminate(void (*)()) + 12
7   libc++abi.dylib                        0x1802a9954 __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 32
8   libc++abi.dylib                        0x1802a9914 __cxa_throw + 132
9   realm_dart                             0x1068364e8 realm::SlabAlloc::attach_file(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::SlabAlloc::Config&, realm::util::WriteObserver*) + 2748
10  realm_dart                             0x106872770 realm::DB::open(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::DBOptions const&) + 2152
11  realm_dart                             0x106879fb8 realm::DB::create(std::__1::unique_ptr<realm::Replication, std::__1::default_delete<realm::Replication>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::DBOptions const&) + 164
12  realm_dart                             0x106645cb8 realm::_impl::RealmCoordinator::open_db() + 948
13  realm_dart                             0x106646c38 realm::_impl::RealmCoordinator::do_get_realm(realm::RealmConfig&&, std::__1::shared_ptr<realm::Realm>&, std::__1::optional<realm::VersionID>, realm::util::CheckedUniqueLock&, bool) + 76
14  realm_dart                             0x106646b54 realm::_impl::RealmCoordinator::get_realm(realm::RealmConfig, std::__1::optional<realm::VersionID>) + 460
15  realm_dart                             0x106697114 realm::Realm::get_shared_realm(realm::RealmConfig) + 120
16  realm_dart                             0x1066d27c8 (anonymous namespace)::PersistedSyncMetadataManager::create_file_action(realm::SyncFileAction, std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::basic_string_view<char, std::__1::char_traits<char>>) + 104
17  realm_dart                             0x1066bd828 realm::app::User::create_file_action(realm::SyncFileAction, std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>) + 360
18  realm_dart                             0x1066e7358 realm::SyncSession::update_error_and_mark_file_for_deletion(realm::SyncError&, realm::SyncSession::ShouldBackup) + 492
19  realm_dart                             0x1066ea364 realm::SyncSession::handle_error(realm::sync::SessionErrorInfo) + 1196
20  realm_dart                             0x1066e9780 realm::SyncSession::handle_fresh_realm_downloaded(std::__1::shared_ptr<realm::DB>, realm::Status, realm::sync::ProtocolErrorInfo::Action, std::__1::optional<realm::sync::SubscriptionSet>) + 436
21  realm_dart                             0x1066e9460 realm::SyncSession::download_fresh_realm(realm::sync::ProtocolErrorInfo::Action) + 8056
22  realm_dart                             0x1066f40bc realm::util::UniqueFunction<void (realm::sync::ConnectionState, std::__1::optional<realm::sync::SessionErrorInfo>)>::SpecificImpl<realm::SyncSession::create_sync_session()::$_10>::call(realm::sync::ConnectionState&&, std::__1::optional<realm::sync::SessionErrorInfo>&&) + 692
23  realm_dart                             0x10679705c realm::sync::SessionWrapper::on_suspended(realm::sync::SessionErrorInfo const&) + 120
24  realm_dart                             0x1067eb2c0 realm::sync::ClientImpl::Session::suspend(realm::sync::SessionErrorInfo const&) + 256
25  realm_dart                             0x1067e6510 realm::sync::ClientImpl::Session::receive_error_message(realm::sync::ProtocolErrorInfo const&) + 1224
26  realm_dart                             0x1067e5e00 realm::sync::ClientImpl::Connection::receive_error_message(realm::sync::ProtocolErrorInfo const&, unsigned long long) + 100
27  realm_dart                             0x1067e4b04 void realm::_impl::ClientProtocol::parse_message_received<realm::sync::ClientImpl::Connection>(realm::sync::ClientImpl::Connection&, std::__1::basic_string_view<char, std::__1::char_traits<char>>) + 3440
28  realm_dart                             0x1067edc00 realm::sync::ClientImpl::Connection::WebSocketObserverShim::websocket_binary_message_received(realm::util::Span<char const, 18446744073709551615ul>) + 116
29  realm_dart                             0x1067bfd60 (anonymous namespace)::WebSocket::frame_reader_loop() + 124
30  realm_dart                             0x1067b2d1c void realm::sync::network::Service::AsyncOper::do_recycle_and_execute<realm::util::UniqueFunction<void (std::__1::error_code, unsigned long)>, std::__1::error_code&, unsigned long&>(bool, realm::util::UniqueFunction<void (std::__1::error_code, unsigned long)>&, std::__1::error_code&, unsigned long&) + 192
31  realm_dart                             0x1067b32a4 realm::sync::network::Service::BasicStreamOps<realm::sync::network::Socket>::BufferedReadOper<realm::util::UniqueFunction<void (std::__1::error_code, unsigned long)>>::recycle_and_execute() + 212
32  realm_dart                             0x1067b951c realm::sync::network::Service::Impl::run_impl(bool) + 388
33  realm_dart                             0x1067a9bb4 realm::sync::websocket::DefaultSocketProvider::event_loop() + 212
34  realm_dart                             0x1067abe94 void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (realm::sync::websocket::DefaultSocketProvider::*)(), realm::sync::websocket::DefaultSocketProvider*>>(void*) + 72
35  libsystem_pthread.dylib                0x1050f3414 _pthread_start + 104
36  libsystem_pthread.dylib                0x1050ee5e0 thread_start + 8

Steps & Code to Reproduce

Assuming you have xcode 15.4 installed and selected, and a simulator running with iOS 17.5, and BAAS_URL points to a running BAAS instance.

git clone https://github.com/realm/realm-dart.git
cd realm-dart
curl -o- https://puro.dev/install.sh | PURO_VERSION="1.4.6" bash
puro use -g stable
puro pub global activate melos
melos bootstrap
melos setup
melos build:native
cd packages/realm/tests
flutter test integration_test/all_tests.dart --dart-define=BAAS_URL=$BAAS_URL --dart-define=BAAS_DIFFERENTIATOR=something --file-reporter=json:test-results.json --suppress-analytics -d iphone
sync-by-unito[bot] commented 3 months ago

➤ PM Bot commented:

Jira ticket: RCORE-2139

nielsenko commented 3 months ago

~This does not happen with xcode 14.3.1 for some reason.~ This was a faulty observation. I can make this fail with xcode 14.3.1 as well.

nirinchev commented 3 months ago

This looks like an assertion failure - can we get the message of the assertion?

nielsenko commented 3 months ago

@nirinchev Recreated with a debug build. I don't get more info.

nicola-cab commented 3 months ago

Are you able to identify which assertion is throwing? @nielsenko

nielsenko commented 3 months ago

@nicola-cab It is not one of the assertions. The stack-trace would have included realm::util::terminate_with_info.

nicola-cab commented 3 months ago

@nielsenko I suspect it is one of these guys: REALM_ASSERT_EX(), they throw. I'll try to repro locally.

nielsenko commented 3 months ago

@nielsenko I suspect it is one of these guys: REALM_ASSERT_EX(), they throw. I'll try to repro locally.

Why don't we see realm::util::terminate_with_info in the stack-trace then?

nielsenko commented 3 months ago

Running again under the xcode debugger I get a slightly different stacktrace:

#0  0x00000001054f93b0 in __pthread_kill ()
#1  0x0000000105367124 in pthread_kill ()
#2  0x00000001801655c0 in abort ()
#3  0x00000001802a7798 in abort_message ()
#4  0x0000000180298ed0 in demangling_terminate_handler() ()
#5  0x00000001800634a8 in _objc_terminate() ()
#6  0x00000001802a6c70 in std::__terminate(void (*)()) ()
#7  0x00000001802a98dc in __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) ()
#8  0x00000001802a989c in __cxa_throw ()
#9  0x0000000108f70230 in realm::util::File::open_internal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::util::File::AccessMode, realm::util::File::CreateMode, int, bool*) at /Users/kasper/Projects/mongodb/realm-dart/packages/realm_dart/src/realm-core/src/realm/util/file.cpp:559
#10 0x000000010894f7fc in realm::util::File::open(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::util::File::AccessMode, realm::util::File::CreateMode, int) at /Users/kasper/Projects/mongodb/realm-dart/packages/realm_dart/src/realm-core/src/realm/util/file.hpp:1058
#11 0x000000010894f570 in realm::util::File::open(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::util::File::Mode) at /Users/kasper/Projects/mongodb/realm-dart/packages/realm_dart/src/realm-core/src/realm/util/file.hpp:1053
#12 0x0000000108d42ff0 in realm::util::InterprocessMutex::set_shared_part(realm::util::InterprocessMutex::SharedPart&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) at /Users/kasper/Projects/mongodb/realm-dart/packages/realm_dart/src/realm-core/src/realm/util/interprocess_mutex.hpp:178
#13 0x0000000108d40c2c in realm::DB::open(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::DBOptions const&) at /Users/kasper/Projects/mongodb/realm-dart/packages/realm_dart/src/realm-core/src/realm/db.cpp:1116
#14 0x0000000108d44504 in realm::DB::open(realm::Replication&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, realm::DBOptions const&) at /Users/kasper/Projects/mongodb/realm-dart/packages/realm_dart/src/realm-core/src/realm/db.cpp:1503
...

and the following context:

libc++abi: terminating due to uncaught exception of type realm::FileAccessError: Failed to open file at path '/Users/kasper/Library/Developer/CoreSimulator/Devices/BC728C97-01D8-4EC5-9E3A-284EB93486A1/data/Containers/Data/Application/7CC281EB-4AF5-489F-9000-2D2793CE3390/tmp/realm_test_iHROtQ/mongodb-realm/flexible-local-uviabmj/server-utility/metadata/sync_metadata.realm.management/access_control.versions.mx': Too many open files

So it seems the crash is due to an uncaught exception. Obviously it should not be uncaught and should not crash, but it is also weird that we get into this "Too many open files" situation.

ironage commented 3 months ago

@nielsenko can you see if a simulator or test process is leaving open files on your system? Try lsof "/Volumes/Macintosh HD"

(Note: REALM_ASSERT_EX() doesn't throw)

nielsenko commented 3 months ago

@ironage There are two things here:

  1. Why does raising an exception crash?
    • It seems this can happen if an error is raised during a client reset.
  2. Why are we running low on file descriptors?
nirinchev commented 3 months ago

The reason it crashes is because it's an unhandled exception on the sync client thread - I don't believe we're actively trying to handle all exceptions in Sync, because they're unexpected/extremely rare and in most cases, it's best to just crash the process rather than leave it in some weird state.

nielsenko commented 3 months ago

For situations where we cannot recover, I think it would be good to make it more explicit that we are terminating on purpose and why it is happening. Like do an explicit abort on any uncaught exception, and log using the static logger what the exception is first, so that the SDK is notified first?

nielsenko commented 3 months ago

Hmm.. if I downgrade the github runner to macos-12, then the Flutter iOS tests pass without hitting this issue. Here realm-core is still build with xcode 15.4, but the actual tests are run on a simualtor on macos-12 (which is x64), as opposed to this run which is run on macos-14 runner (which is arm64)

nielsenko commented 3 months ago

Okay. It seems something changed in later versions of macos (or maybe the simulators that come with later versions of xcode).

Since the simulator is just a process on macos it used to be that we could just use ulimit to increase the available file descriptors on the host and have it reflect in the simulator. We have used that trick for our tests since they juggle a lot of files, but it no longer seems to work.

I have tried increasing the limit with setrlimit from within the librealm_dart and then it works just fine.

@nirinchev is probably right that there is little we can do, if this happens during a client reset, but I still think we should bail in a more informative way. Fx, by logging a fatal error before aborting.

ironage commented 3 months ago

Normally, if there is an error during client reset, we do indeed log the error and forward that on to the user's sync error handler. However, this is an unusual case of error-upon-error: while handling a client reset error, we experienced an error while updating the metadata realm to track the recovery path for the manual client reset. You could check the logs for the original reason why the client reset failed, but I'd suspect it also has to do with maxing out the file handles. I'm not sure what further graceful steps we could take when we cannot access the file system, at some point we need to just fail the process. But perhaps logging the additional failure would be a good place to start.

ironage commented 3 months ago

Note that if we do implement a high level try...catch for cases like these in order to log the exception message, we risk losing the stack trace which can sometimes be more valuable for figuring out what actually happened.

nirinchev commented 3 months ago

Can't we just rethrow the original exception after logging, thus not modifying the stacktrace?

tgoyne commented 3 months ago

No, c++ exceptions don't work that way. The exception doesn't capture the stack trace, and the stack trace you get is the result of the process being killed without unwinding the call stack.

nirinchev commented 3 months ago

I see, that's unfortunate and then I agree that probably preserving the stacktrace would be more valuable than logging the error.

sync-by-unito[bot] commented 2 months ago

➤ ironage commented:

I'm closing this as I do not see any further actions to take.