groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.83k stars 705 forks source link

Unexplained background crash that _might_ be related to GRDB #1085

Closed freak4pc closed 2 years ago

freak4pc commented 2 years ago

What did you do?

We noticed a few unexplained crashes on AppStore Connect that seem to happen while a user is in the background. The crashing thread isn't pointing to GRDB specifically, but GRDB-related code can be seen in Thread 2 of the crash log (attached), and since we've added it relatively recently, it draws a bit of sense to suspect it.

crash_report.txt (Some of the symbols have been starred-out for obfuscation)

The part with upsertEntity in the stack trace has code that looks similar to the following

database.asyncWrite(
    work: { db in 
        try Entity.updateAll(db, Entity.Column.recencyIndex.set(to: nil))

        for recent in recents {
            try recent.upsertEntity(db) // -> Trace highlights this line
        }
    },
    completion: { result in ... }
)

And upsertEntity itself looks like this:

extension Entity {
    func upsertEntity(_ db: GRDB.Database) throws {
        guard let localEntity = try Entity.fetchOne(db, ...) else {
            try insert(db)
            return
        }

        let updatedEntity = Entity(
            ... // Updated model modeled after `localEntity` with some modified values
        )

        try updatedEntity.update(db)
    }
}

What did you expect to happen?

No crashes while user is in background

What happened instead?

A background crash that isn't necessarily directly related to GRDB, but does have it "near the surface" and close to the crashing thread. I've also specifically wanted to share some of the code above to see if there's anything "fishy" that pops to your eyes that could cause it.

I did suspect Data Protection, but even if that was the cast, I'd expect it to just throw an error (catched by the asyncWrite) and not crash the app.

Environment

GRDB flavor(s): GRDB GRDB version: 5.10 Installation method: CocoaPods Xcode version: 13.0 Swift version: 5.5 Platform(s) running GRDB: iOS macOS version running Xcode: 11.6

Demo Project

Unfortunately can't share an actual project and also can't directly "repro" this. I've shared the crash log and relevant code portions above, which I hope can help a bit with figuring this out.

groue commented 2 years ago

Hello @freak4pc, thanks for the report!

A background crash that isn't necessarily directly related to GRDB, but does have it "near the surface" and close to the crashing thread. I've also specifically wanted to share some of the code above to see if there's anything "fishy" that pops to your eyes that could cause it.

Nothing yet. I understand that you have several similar crashes, and that if it always the main thread that happens to crash, you also always have sqlite3_step running in some other thread.

I do not spot anything fishy in your code or the stack trace. Identifying the Cause of Common Crashes and Understanding the Exception Types in a Crash Report list cases that I could not make match with your crash report (you may want to have a second look).

I did suspect Data Protection, but even if that was the cast, I'd expect it to just throw an error (catched by the asyncWrite) and not crash the app.

It's been a few years I wrote the Data Protection documentation chapter, so it may now be out of date. I'm positive I could observe, catch, and recover from regular errors of code SQLITE_IOERR/SQLITE_AUTH during accesses to an unavailable database in the background.

We may have to check if this chapter is still accurate, but first I recommend grabbing more information about your crash. If you can reproduce the crash on a device in your hands, you'll have access to the raw device console with Console.app. This console is awfully verbose, but it is very likely to give us precious contextual information.

groue commented 2 years ago

Hello @freak4pc, do you have any update?

freak4pc commented 2 years ago

Hey @groue - we're still seeing these crashes but it's all users in different countries, so we don't have access to their physical device (and definitely not in the immediacy needed to get a console dump).

I'm not entirely sure what to do. I guess we can close this and wait to see if I can get any new findings later on.

groue commented 2 years ago

OK, let's close this. Feel free to report any finding, in case it could help other users 👍

freak4pc commented 2 years ago

Hey @groue, Appreciate if you can re-open this, I have more clarity about what is (probably) happening.

We've noticed this happening a lot more lately, and I believe looking at the crash logs, that the watchdog is terminating the app due to the main thread being occupied for too long (over 10s, which is the watchdog limit).

We did a "test" where we nil the database pool whenever the app goes to background, and then reinit it when it comes back to foreground and the crash seems to stop.

This is confirmed by some of the crash logs:

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: FRONTBOARD 2343432205 
<RBSTerminateContext| domain:10 code:0x8BADF00D explanation:scene-update watchdog transgression: application<com.monday.monday>:5492 exhausted real (wall clock) time allowance of 10.00 seconds
ProcessVisibility: Background
ProcessState: Running
WatchdogEvent: scene-update
WatchdogVisibility: Background
WatchdogCPUStatistics: (
"Elapsed total CPU time (seconds): 44.010 (user 28.430, system 15.580), 67% CPU",
"Elapsed application CPU time (seconds): 10.220, 15% CPU"
) reportType:CrashLog maxTerminationResistance:Interactive>

As you see the code is 0x8BADF00D (0x Ate Bad Food): image

Looking into the desymbolicatd crash log main thread I saw two GRDB frames:

Thread 0 name:   Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0   libsystem_kernel.dylib                 0x1b803166c __psynch_mutexwait + 8
1   libsystem_pthread.dylib                0x1f1c1b3cc _pthread_mutex_firstfit_lock_wait + 84
2   libsystem_pthread.dylib                0x1f1c2250c _pthread_mutex_firstfit_lock_slow + 240
3   Foundation                             0x18284d904 -[NSRecursiveLock lock] + 28
4   GRDB                                   0x1051fa2cc 0x105050000 + 1745612
5   GRDB                                   0x1050b5790 0x105050000 + 415632
6   libdispatch.dylib                      0x180d01914 _dispatch_call_block_and_release + 32
7   libdispatch.dylib                      0x180d03660 _dispatch_client_callout + 20
8   libdispatch.dylib                      0x180d11b60 _dispatch_main_queue_callback_4CF + 944
9   CoreFoundation                         0x181049cd4 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 16
10  CoreFoundation                         0x181003eac __CFRunLoopRun + 2540
11  CoreFoundation                         0x1810173b8 CFRunLoopRunSpecific + 600
12  GraphicsServices                       0x19c9a738c GSEventRunModal + 164
13  UIKitCore                              0x1839b76a8 -[UIApplication _run] + 1100
14  UIKitCore                              0x1837367f4 UIApplicationMain + 2092
15  monday                                 0x103518460 0x102428000 + 17761376
16  dyld                                   0x104bbda24 start + 520

I was able to symbolicate it using atos and it seems to be related to the value observer:

image

So I'm assuming this means:

  1. I'm observing something
  2. This something keeps firing when I'm in the background (not sure if it's supposed to?)
  3. It emits the values on the main thread
  4. Something (probably in my code) takes more than 10s upon that emission that causes the watchdog to terminate the app.

Weird thing is that the crash log doesn't point to any of my code but only GRDB-specific code here.

Does this make sense for you? Did you ever bump into this? Any thoughts on what I could do to mitigate the issue, aside for closing and re-opening the database pool on background/foreground?

Thank you so much!

freak4pc commented 2 years ago

P.S. As you see this isn't the same crash as the original, but it seems similar (at least from the EXC_CRASH (SIGKILL) EXC_CORPSE_NOTIFY exception angle). If I have any more thoughts on that original crash I'll share those as well.

P.S.2 We also sometimes note these in the logs around the crash:

Message from debugger: Terminated due to signal 9
Background task still not ended after expiration handlers were called
groue commented 2 years ago

Thanks for the update, @freak4pc

So it looks like a deadlock. Which GRDB version are you using?

freak4pc commented 2 years ago

Thanks, we're on 5.15

groue commented 2 years ago

Thanks, we're on 5.15

OK. ValueObserver.notifyChanges does indeed use an NSRecursiveLock, and this may be the one that is deadlocked. It may also be a lock owned by your app, but let's suppose it's the GRDB one.

I hope your crash log contains the stack trace of all of your application threads. Can you notice other threads that are busy running some GRDB method?

freak4pc commented 2 years ago

Thanks for your comment :) I'm seeing a few more operations on Thread 18 name: Dispatch queue: com.monday.database.writer:

Thread 18 name:   Dispatch queue: com.monday.database.writer
Thread 18:
0   libswiftCore.dylib                     0x185db180c tryCast(swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::TargetMetadata<swift::InProcess> const*&, swift::TargetMetadata<swift::InProcess> const*&, bool, bool) + 2516
1   libswiftCore.dylib                     0x185db180c tryCast(swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::TargetMetadata<swift::InProcess> const*&, swift::TargetMetadata<swift::InProcess> const*&, bool, bool) + 2516
2   libswiftCore.dylib                     0x185db1454 tryCast(swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::TargetMetadata<swift::InProcess> const*&, swift::TargetMetadata<swift::InProcess> const*&, bool, bool) + 1564
3   libswiftCore.dylib                     0x185db1454 tryCast(swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::TargetMetadata<swift::InProcess> const*&, swift::TargetMetadata<swift::InProcess> const*&, bool, bool) + 1564
4   libswiftCore.dylib                     0x185db0d2c swift_dynamicCastImpl(swift::OpaqueValue*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*, swift::TargetMetadata<swift::InProcess> const*, swift::DynamicCastFlags) + 72
5   libswiftFoundation.dylib               0x1856f2004 __JSONDecoder.container<A>(keyedBy:) + 540
6   libswiftFoundation.dylib               0x1856e62bc protocol witness for Decoder.container<A>(keyedBy:) in conformance __JSONDecoder + 24
7   libswiftCore.dylib                     0x185d7c248 dispatch thunk of Decoder.container<A>(keyedBy:) + 32
8   monday                                 0x10254f2d4 0x102428000 + 1209044
9   monday                                 0x10254f7d8 0x102428000 + 1210328
10  libswiftCore.dylib                     0x185d7c124 dispatch thunk of Decodable.init(from:) + 32
11  libswiftFoundation.dylib               0x18570b87c _JSONUnkeyedDecodingContainer.decode<A>(_:) + 3308
12  libswiftFoundation.dylib               0x18570cdf4 protocol witness for UnkeyedDecodingContainer.decode<A>(_:) in conformance _JSONUnkeyedDecodingContainer + 16
13  libswiftCore.dylib                     0x185aa3984 Array<A>.init(from:) + 580
14  libswiftCore.dylib                     0x185aa3a48 protocol witness for Decodable.init(from:) in conformance <A> [A] + 32
15  libswiftCore.dylib                     0x185d7c124 dispatch thunk of Decodable.init(from:) + 32
16  libswiftFoundation.dylib               0x1856e8394 __JSONDecoder.unbox_(_:as:) + 1268
17  libswiftFoundation.dylib               0x1856d3fc4 JSONDecoder.decode<A>(_:from:) + 1548
18  libswiftFoundation.dylib               0x18570bfc4 dispatch thunk of JSONDecoder.decode<A>(_:from:) + 56
19  MondayPersistence                      0x105c2eb10 0x105c24000 + 43792
20  MondayPersistence                      0x105c2e644 0x105c24000 + 42564
21  monday                                 0x103aab214 0x102428000 + 23605780
22  monday                                 0x103aab5dc 0x102428000 + 23606748
23  MondayPersistence                      0x105c2ee9c 0x105c24000 + 44700
24  monday                                 0x103aab6a8 0x102428000 + 23606952
25  GRDB                                   0x1050ead60 0x105050000 + 634208
26  GRDB                                   0x1050ec458 0x105050000 + 640088
27  GRDB                                   0x10506dcfc 0x105050000 + 122108
28  GRDB                                   0x1050eb2d0 0x105050000 + 635600
29  GRDB                                   0x1050eb634 0x105050000 + 636468
30  GRDB                                   0x1050ec158 0x105050000 + 639320
31  monday                                 0x102e36820 0x102428000 + 10545184
32  monday                                 0x102e368fc 0x102428000 + 10545404
33  monday                                 0x102e36944 0x102428000 + 10545476
34  monday                                 0x102e369e4 0x102428000 + 10545636
35  GRDB                                   0x1050ea884 0x105050000 + 632964
36  GRDB                                   0x1050ea920 0x105050000 + 633120
37  GRDB                                   0x1051faf0c 0x105050000 + 1748748
38  GRDB                                   0x1051f9810 0x105050000 + 1742864
39  GRDB                                   0x1051fae70 0x105050000 + 1748592
40  GRDB                                   0x105077dc8 0x105050000 + 163272
41  GRDB                                   0x1051fa04c 0x105050000 + 1744972
42  GRDB                                   0x1051f9994 0x105050000 + 1743252
43  GRDB                                   0x1051fae4c 0x105050000 + 1748556
44  GRDB                                   0x1051f0f00 0x105050000 + 1707776
45  GRDB                                   0x105077a08 0x105050000 + 162312
46  GRDB                                   0x10507b118 0x105050000 + 176408
47  GRDB                                   0x105081388 0x105050000 + 201608
48  GRDB                                   0x10507a57c 0x105050000 + 173436
49  GRDB                                   0x10507b304 0x105050000 + 176900
50  GRDB                                   0x10507afdc 0x105050000 + 176092
51  GRDB                                   0x1051f9038 0x105050000 + 1740856
52  GRDB                                   0x1050b8e00 0x105050000 + 429568
53  GRDB                                   0x10516cb54 0x105050000 + 1166164
54  GRDB                                   0x1050b5790 0x105050000 + 415632
55  libdispatch.dylib                      0x180d01914 _dispatch_call_block_and_release + 32
56  libdispatch.dylib                      0x180d03660 _dispatch_client_callout + 20
57  libdispatch.dylib                      0x180d0ade4 _dispatch_lane_serial_drain + 672
58  libdispatch.dylib                      0x180d0b958 _dispatch_lane_invoke + 392
59  libdispatch.dylib                      0x180d161a8 _dispatch_workloop_worker_thread + 656
60  libsystem_pthread.dylib                0x1f1c1a0f4 _pthread_wqthread + 288
61  libsystem_pthread.dylib                0x1f1c19e94 start_wqthread + 8

I'll try symbolicating these and add it to this thread.

freak4pc commented 2 years ago

Here's the full symbolicated log:

2022-01-2022-29-24.6190+0200-4788e7062cae210b2b52bc1e4f77d4dc2dafcec8.txt

I see that there's some of our code in between the GRDB, related code - which makes me think it's some conflict we're not aware of?

image

UsersProvider.users(sql:) looks like this (database.rx.observe is a simple RxSwift wrapper that uses ValueObservation):

func users(sql: String? = nil) -> Observable<[UserModel]> {
    database.rx.observe { db in
        var query = UserModel
            .filter(UserModel.Column.isDisabled == false)
        if let sql = sql, !sql.isEmpty {
            query = query.filter(sql: sql)
        }
        return try query
            .order(UserModel.Column.name.asc)
            .fetchAll(db)
    }
}
freak4pc commented 2 years ago

Sorry for another separate reply here, just wanted to share another crash report where GRDB isn't on the main thread, but it's still in that same "writer queue" with monday stuff in between some GRDB operations.

The similarity is there, I'm wondering if it's somehow related. Perhaps I'm doing something I'm not supposed to, even though I'm always performing work inside transactions, etc.

Here it is: crash2.txt

groue commented 2 years ago

Focusing on the first crashlog, I don't understand why the main thread appears locked as it tries to notify a database change to your app. The second thread that involves GRDB is starting a new observation (and is not waiting).

May I ask how your application is using value observations? How frequently are they started? What is your app doing when fresh database values are notified? Does it happen that observations feed themselves? For example, are observations started from change notifications? Is the database modified as a result of change notifications? Anything that looks "fancy" in any way (please have a low threshold for "fancy")?

freak4pc commented 2 years ago

We are using them pretty frequently in some places in the app. We have about 13 instances of them throughout the app.

Since the observations are wrapped in an Observable it might be a bit tricky tracking down what consumers might be writing to the database, but those are (should) still be isolated.

Two big things that I think could be related:

  1. We have a relatively big insert/upsert close to the start of the app, about 20k records possibly for a large user. There are different areas in the app observing changes to that table (last 200 items, but still the table as a whole). not sure if this is related and it seems like a valid case?
  2. We have another area that stores a list of users using asyncWrite, and a different area observing them. Still, don't think this is too fancy ...

Is there any thing invalid with doing a separate write transaction inside a value observation? I don't think we have this case but you did mention it as an option.

groue commented 2 years ago

Nothing really rings any bell to me.

Is there any thing invalid with doing a separate write transaction inside a value observation?

No, it is not invalid. I was looking for anything that could have been overlooked.

freak4pc commented 2 years ago

Is there any additional information I could try and debug around this? I'm happy to try anything... I could share all 13 ValueObservations in the app but I'm not sure it'd actually help :)

groue commented 2 years ago

I really don't know, @freak4pc... You're at the the best position for reducing and reproducing, I'm afraid. I'd love to help but I'm no wizard.

freak4pc commented 2 years ago

Sure, I agree, but the crash in the original message does seem to revolve around locking inside GRDB so I thought you might have some thoughts (of course I didn't think you're any wizard ;-) ). I'll try to keep poking at this. Thanks.

groue commented 2 years ago

Sure. I'll leave the issue open for a while.

freak4pc commented 2 years ago

Two updated on this:

  1. We've started filtering emissions from ValueObservations while the app is in the background - this seems to help with the crashes (we'll monitor)

  2. We noticed the app talks about a UIBackgroundTask not being completed in the expiration threshold in the background (Background task still not ended after expiration handlers were called). The relevant background task is most likely here. Could you share a bit what releaseMemory does? Is there anything that could cause it to halt? Might help me further debug on our end?

https://github.com/groue/GRDB.swift/blob/master/GRDB/Core/DatabasePool.swift#L214-L224

Thanks!

groue commented 2 years ago

Could you share a bit what releaseMemory does?

Sure:

Is there anything that could cause it to halt? Might help me further debug on our end?

DatabasePool.releaseMemory calls Pool.barrier, and waits until all current reads are completed (so that it can close all reader connections).

groue commented 2 years ago

DatabasePool.releaseMemory calls Pool.barrier, and waits until all current reads are completed (so that it can close all reader connections).

More precisely, it waits until no one is waiting for a read access. It then closes all reader connections.

groue commented 2 years ago

I'm not sure it is useful to keep this issue opened, since no progress is done. Let's wait for more actionable input!