groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.73k stars 696 forks source link

Migrating existing DB to an AppGroup #643

Closed foxware00 closed 4 years ago

foxware00 commented 4 years ago

Question

I'm trying to create a Share extension that has access to my host application's database. The problem is that users already have the SQLite file created and populated. I know the path of the existing database and where the new one is.

My question is there a clean and easy way to migrate the SQLite file to another path in a one-off operation. I'm guessing there are a few files that all need to be copied and duplicated cleanly.

My initial thoughts would be to open the current DB, create a new on in a new path and copy all the data over. This might take a few seconds the first run but seems like the cleanest way to do it. I'd delete the old DB, and switch over to the new DB once the migration has completed. Or would a quicker direct file copy make more sense?

I've found a CoreData method that migrates data from one URL to another, is this something that's easy to achieve in GRDB?

Environment

GRDB flavor(s): GRDB GRDB version: 4.1.1 Installation method: CocoaPods Xcode version: 11.1 Swift version: 5.1 Platform(s) running GRDB: iOS macOS version running Xcode: 10.14.5

groue commented 4 years ago

Hello @foxware00,

My advice is to use the backup API.

This sounds to me like the safest way to migrate your app in a context where several processes are attempting at using it. I hope that SQLite is able to deal with on-progress backups correctly. This is not something I have personally tested.

Now, I suggest that you:

If you have an interesting story to tell after this, please come back and share your experience! I'll answer other questions if you have any. But will have to grasp a few things about SQLite: click on the links I gave you.

foxware00 commented 4 years ago

@groue Thanks as always for a prompt and detailed response.

The backup API looks ideal, I will do some investigation. I didn't want to manually copy files if there was a better way to achieve it. It looks like a simple way to easily transfer the data. I'll be sure to open the existing connection I want to migrate in read-only mode to prevent data changing before the backup has finished. Given it's a blocking call, I'm hoping it'll be a simple as follows.

  1. Open existing connection in read-only mode
  2. Open new connection in App Group
  3. Run back up into the new connection.
  4. If successful point app at the new connection
  5. Delete and close the old connection.

I'll give those links a good read over. My current thought was to use the app extensions mainly to just query data the main app has. Given the easiest way to achieve this is by sharing my base target framework (Containing my data layer), I'm getting a lot of the data I need for free without having to re-write the queries and syncing logic.

I'm going to have a proper play at sharing the connection, but given all writes are transactional it'll be interesting to see if there are any issues. I'm currently using a database pool in WAL mode. I might consider also opening the extension's connection in readOnly mode to prevent multiple writes to the same SQL file. From the looks of it, WAL mode is designed for this purpose, with the small unknown of how IPC and shared memory working with App Extensions and the host app on iOS. So off for some in-depth testing of how it all works. From 2.2 in this link "Readers can exist in separate processes"

Is there something different in GRDB from raw SQLite that is abstracting some of the normal connection locks and transactions away from native SQLite? I guess my point is, isn't SQLite designed for this scenario when using transactions and WAL mode? Or does each process get a different WAL?

Anyway sorry for my rambling. Feel free to close the issue, and thank you for taking the time to give me a detailed response!

groue commented 4 years ago

Is there something different in GRDB from raw SQLite that is abstracting some of the normal connection locks and transactions away from native SQLite? I guess my point is, isn't SQLite designed for this scenario when using transactions and WAL mode? Or does each process get a different WAL?

GRDB is indeed a very thin layer over SQLite.

As you say, one writer and several readers are totally OK for a WAL database. A read-only pool or queue in the extension should not impact the read-write pool in the main app at all.

But we also have to deal with iOS. This and this tell us that... your rambling is normal, and I don't have a definitive answer because I haven't experienced this setup yet.

foxware00 commented 4 years ago

But we also have to deal with iOS. This and this tell us that... your rambling is normal, and I don't have a definitive answer because I haven't experienced this set up yet.

Fortunately, the Database is not encrypted nor do I target iOS 8 and below.

I'm going to have a good play today and test a few things. Many thanks for all the help.

groue commented 4 years ago

Yes, I didn't want to sound alarming, sorry :-) I just wanted to say that iOS has its word, so... yeah, test your stuff. I'm closing this issue now. Please open a new one if you have any further question!

klaas commented 4 years ago

@foxware00 I would really be interested in your long time experience and if you find some edge cases that have to be taken into account. Even a thumbs up or down would be great :-)

I'm facing a similar issue and right now I'm using GRDB only in the main app. The share extension and the main app communicate with one-way-only JSON files (e.g. every shared item creates a small unique file that is integrated in the database as soon as the main app gets a chance to do so).

foxware00 commented 4 years ago

@klaas I've chosen a slightly different route from the one you mention. I'm migrating the existing database into the app group so I can access my device/user information required to complete the share without having to keep it in sync and fetch it again.

I make a simple check during the startup of the Share extension to check whether the migration has happened. Directing them to the main app first so I can perform the migration. I can also use the migrated data to validate a user being logged in being able to perform functions.

This was I can write straight to the database from the extension.

As mentioned higher up in this thread, I'm using a WAL file (database pools) to prevent read's and writes being blocked by either app. I'm keeping the logic and writing to the DB to a minimum, simply writing the state of the shared item into a pending state so when the app opens I can show the progress of the upload awaiting the real post from the server.

I've also had to migrate my keychain and user defaults.

Shout if you have any more questions or would like me to explain something in more detail.

groue commented 4 years ago

I'd like to distinguish a few cases:

  1. Read/write from the app, read/only from the extension. The extension does NOT observe the database for changes performed by the app.

    This is the most simple case. If the app uses the WAL mode (with a DatabasePool), then the app should always be able to read and write, and the extension should always be able to read, without any error, even if both processes concurrently access the database.

  2. Read/write from the app, read/write from the extension. The app and the extension do NOT observe the database for changes performed by the other process.

    Because both processes want to write in the database, they both may see SQLITE_BUSY errors, even if they use the WAL mode (with a DatabasePool). The WAL mode still does not allow two connections to write in the same db at the same time.

    SQLITE_BUSY errors are a pain, because they are a threat to all writes.

    Fortunately, SQLite has busy handlers. Those are configured in GRDB using the Configuration.busyMode property.

    There is also the threat of SQLITE_LOCKED, but I'm less familiar with this one, and I'm not sure it can happen due to concurrent writes.

  3. Read/write from the app, read/write from the extension. The app or the extension does observe the database for changes performed by the other process.

    On top of the handling of SQLITE_BUSY errors seen above, we now have to kick some database observations.

    This is a topic that nobody has ever shared any hint about. And I never had to implement it. This recent article https://www.avanderlee.com/swift/core-data-app-extension-data-sharing/ contains useful information. It is about Core Data, but it does not make it less relevant.

    The observation APIs currently provided by GRDB are all about Not! Missing! A! Single! Impactful! Transaction! Sorry for the shout, but this is very important. Being sure that not a single modification is missed is crucial for some apps, and currently all GRDB database observation tools provide this very strong guarantee. Apps process absolutely all individual transactions that have performed changes, with access to the changed data, without any coalescing or glitch, ever.

    To this end, all GRDB observations insert a "stop the world" handler that is run after each transaction that is deemed relevant has been committed on disk. During the execution of this handler, the app can't write, and the database can be queried (DatabaseQueue), or a concurrent reader can spawn and enter snapshot isolation right from the last written database state (usually faster, available only for database pools, used under the hood by ValueObservation, for example).

    This "stop-the-world" guarantee has to be dropped if we talk about change notifications between two processes. This is because when process P1 wants to perform change C1 and then change C2, we don't want to guarantee that process P2 has processed the notification of change C1 (including fetching those changes from a known database content), before P1 is allowed to proceed on to change C2.

    We'll thus need another kind of database observation, a relaxed one. This is totally uncharted territory for GRDB.

    I'd love that this topic would become better explored. 🤗

foxware00 commented 4 years ago

@groue An update on this.

I've done a bunch of testing and on the surface, everything is working perfectly. I'm able to migrate the DB, clean up the old one and move users onto the new db version.

However, I've been scratching my head for quite a while. I'm getting a bunch of crashes from our internal testers on testflight. The stack traces don't shed any light, are often random, and often the only code that crashed is "main". However after analysing a large number of crash reports. I've noticed almost always there is another thread in the background running a transaction. (I pre-fetch data, and sync small bits of data off the back of an APNS notification).

This is where it gets difficult. It's only happening on release builds as far as I can tell and only after a couple of hours. No reproduction steps, no explicit code has crashed according to the crashes being uploaded. It seems almost all of the crashes have a background thread that begun a transaction but never finished it. I don't see the line for .commit ever being called.

For additional context, the App Extension isn't actually being run. It's just the "app group" database that's being accessed.

Is it all just a coincidence that that background thread is as such. Crashes are always happening in the background, never when a user is using that app. And the only reason we'd be running in the background is to perform the sync so maybe it is just that.

I wonder if in your little reading around the subject you know of anything that might explain this?

public func inTransaction(transaction: @escaping (Database) -> Void) {
    if let pool = blinkDatabase.dbPool as? DatabasePool {
        try! pool.writeInTransaction { db in
            transaction(db) <--- last call here
            return .commit
        }
    }
}

My database config

var config = Configuration()
config.maximumReaderCount = 10
config.busyMode = .timeout(5)
groue commented 4 years ago

@foxware00, do you have any information about the crash? Could it be related to Data Protection? iOS won't let the app access the database file if it is locked according to its FileProtectionType.

foxware00 commented 4 years ago

Not really, it gets a bunch of stack traces which have no reference to my code other than main

I don't even have an error code or reason as these aren't visible in the organiser. I can share some symbolicated crashes, but no actual code is being marked as an error only 'main' or apple internal APIs leading to more confusion.

I have thought it was something related to that but as far as I can tell I have the correct access. It is accessible whilst locked in the background just sometimes it crashes. I'm getting the URL via https://developer.apple.com/documentation/foundation/filemanager/1412643-containerurl

groue commented 4 years ago

All right @foxware00, thanks for the report. I can only hope you can eventually gather or Google more information.

foxware00 commented 4 years ago

@groue I've found some more information by inspecting the crash reports manually. It seems the root cause is

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: Namespace RUNNINGBOARD, Code 0xdead10cc

Which according to Apple 0xdead10cc is to do with the file lock. Interestingly this wasn't a problem before using App groups. Now to find out why. 🤔

The exception code 0xdead10cc indicates that an application has been terminated by the OS because it held on to a file lock or sqlite database lock during suspension. If your application is performing operations on a locked file or sqlite database at suspension time, it must request additional background execution time to complete those operations and relinquish the lock before suspending.

groue commented 4 years ago

Thanks @foxware00. Could it help if GRDB would wrap its transactions around a request for additional background execution? There is a related comment in the FMDB repo, even if they were not chasing after the same crash: https://github.com/ccgus/fmdb/issues/754#issuecomment-528011814.

foxware00 commented 4 years ago

Possibly. However, it looks like you can't request it in the same way from an app extension. I wonder if there is much of an overhead to doing what you suggest. I'm going to have a quick play and see if there is anything I'm doing wrong inside my performFetchWithCompletionHandler method from AppDelegate. Fortunately, I only have one block of code which runs in the background so wrapping it with a request for additional execution time might be the easiest thing to do. Although this seems odd from within performFetchWithCompletionHandler.

The beginBackgroundTaskWithName:expirationHandler: method cannot be called from an app extension. To request extra execution time from your app extension, call the performExpiringActivityWithReason:usingBlock: method of NSProcessInfo instead.

groue commented 4 years ago

Fortunately, I only have one block of code which runs in the background so wrapping it with a request for additional execution time might be the easiest thing to do.

Sure. Please tell us if this fixes the crash! This will be a very precious information if someone someday decides to wrap all of this is an easy-to-grasp high-level API.

foxware00 commented 4 years ago

Absolutely. Currently, I have a timeout of 29 seconds on the background task, which may be a little close to comfort to the supposed 30 seconds. I'm going to try lowering this to 20 seconds so I can cancel any pre-fetches that might be close to being overrun. My theory is that a request is in flight, and Apple kills the process just before I timeout and callback. At which point my URLSession returns and tries to hold the transaction open to write the response. If this lowers the crashes substantially. I'm going to additionally try wrapping the call in a background execution request and hopefully, the crashes will subside completely. Stay tuned!

groue commented 4 years ago

I surely stay tuned.

I was thinking several things, like:

But there is no rush: we need your feedback first ;-)

foxware00 commented 4 years ago

They both sound like great additions. Dealing with extensions is always tricky as you don't have access to UIApplication.shared which is where you determine your state.

I'm pushing a build to my internal guinea pigs so I can collect some crash statistics on the two options I've implemented. I really like the idea of a generic solution within GRDB though.

Also an interesting tidbit. Logging out UIApplication.shared.backgroundTimeRemaining inside the pre-fetch method gives me different times. 29.962309416674543, 28.535144541674526and 27.565803166653495 all of which might explain why my 29 seconds was overrunning the alotted time.

foxware00 commented 4 years ago

Seems both of my approaches didn't fully fix the issue. Crashes are still occurring with the same error. I found this link giving an example to a similar issue to the one you linked around SQLCipher a while back. Given I'm not encrypting the DB, is there something we're missing? I'm now trying moving back to a DatabaseQueue rather than a pool. This obviously won't settle for release but might see if the issue lies somewhere around that logic. I'm thinking to have something wrapping around the transaction itself might be more bulletproof. Back to the drawing board, for now...

https://github.com/signalapp/SQLCipherVsSharedData

groue commented 4 years ago

Given I'm not encrypting the DB, is there something we're missing?

Maybe it just happens that our friends work with SQLCipher. This does not mean that the issue does not happen to regular unencrypted SQLite databases.

May I ask @charlesmchen-signal and @michaelkirk-signal if you were able to overcome this trouble?

foxware00 commented 4 years ago

I see they've forked GRDB with fixes for the state of the DB when SQLite cipher is used.

https://github.com/michaelkirk-signal/GRDB.swift/commit/0c582293b1f6e77733721a326cf8576648ab9db5

They do allude to it being a problem for only encrypted databases, but it could seem there is an issue with iOS recognizing the state of the files during a suspended state. As I've been experiencing though, debug builds and simulators don't exhibit the issue which is making it all the more difficult to fix.

Set `unencryptedHeaderConfiguration` if you have a SQLCipher database in a shared app
container to avoid 0x10deadcc crashes.

As a rule, when an app or extension is suspended, if that process holds a lock on a file in
a shared container, iOS will kill that process with the exception code: 0x10deadcc.

Because SQLite's ubiquitous WAL mode requires file locking to facilitate concurrency, iOS
provides a special exemption to this rule for SQLite files. However, because SQLCipher
databases are encrypted, iOS cannot recognize them as database files. The exemption does
not apply, and iOS kills the process.

The work around is to leave the first bytes of the SQLCipher database unencrypted.
This allows iOS to recognize that the locked file is a SQLite database and avoids the
crash.

The data in the first bytes of the database file are boilerplate like "SQLite Format 3\0",
not sensitive user data.

See more in `UnencryptedHeaderConfiguration` and at:
https://www.zetetic.net/sqlcipher/sqlcipher-api/#cipher_plaintext_header_size
groue commented 4 years ago

Yes, cipher_plaintext_header_size is about solving https://github.com/sqlcipher/sqlcipher/issues/255, and this is an SQLCipher-specific issue. Maybe the Signal team had to do something else in order to avoid other causes for 0x10deadcc.

foxware00 commented 4 years ago

More woes, it looks as though the closing the connection could be a candidate.

https://twitter.com/BigZaphod/status/1164924859954159618

https://bugzilla.mozilla.org/show_bug.cgi?id=1307822

https://bugzilla.mozilla.org/show_bug.cgi?id=1291304

This has a lot of interesting stuff in it. Not sure of any definitive solution that applies to GRDB

https://bugzilla.mozilla.org/show_bug.cgi?id=1317324#c10

Anyway, DatabaseQueue is uploading, so I'll see if there is something different when using WAL.

foxware00 commented 4 years ago

Sorry for bombarding this thread. More a place for a record than pestering you for a response. It seems WAL might be required anyway.

My train of thought is currently in two places. Before I end the background task, I call Database.close() directly to close all connections before reporting to iOS that I'm done.

Another thing, not sure how much of an issue it is. But I don't delete the old WAL file during migration. I never open a connection to it, but maybe system is looking at the app's directory finding the old (closed) wal and seeing it closed whilst the open connection is still valid.

UPDATE. For opening a connection inside app group it looks like it must be a WAL db. Not sure why mine isn't working. But will try out removing the WAL file as well to ensure no files are hanging around

michaelkirk-signal commented 4 years ago

👋

I see they've forked GRDB with fixes for the state of the DB when SQLite cipher is used.

michaelkirk-signal@0c58229

We are not currently using a forked version of GRDB. In the past we've briefly used forked versions as we wait for changes to get upstreamed. So far we've either been able to get our changes upstreamed or worked with @groue to find some better solution (thanks @groue!).

In this particular case, rather than the specific plaintextHeader api you linked to, we worked with @groue to introduce the more general prepareDatabase method, and use it like this:

https://github.com/signalapp/Signal-iOS/blob/master/SignalServiceKit/src/Storage/Database/GRDBDatabaseStorageAdapter.swift#L437

configuration.prepareDatabase = { (db: Database) in
    let keyspec = try keyspec.fetchString()
    try db.execute(sql: "PRAGMA key = \"\(keyspec)\"")
    try db.execute(sql: "PRAGMA cipher_plaintext_header_size = 32")
}

But, back to the 10deadcc's, just to be clear about what I think we're talking about:

From https://developer.apple.com/library/archive/technotes/tn2408/_index.html

Processes using database in WAL journal mode will only be jetsam'd if they attempt to hold a write transaction open at the time of suspension.

So yes, this will affect encrypted and unencrypted db's alike. Without the cipher_plaintext_header_size configuration it will affect encrypted db's much more.

The short version is we've never completely eliminated these kinds of crashes.

Additionally, in our codebase there is some amount of non-obvious async writes happening as side effects of other code. So we might be at second 29.5 of a background task when some async thing kicks off another write.

We have had a long tail of tracking down this kind of code and, where possible, ensure it doesn't run in the share extension. But sometimes these writes are critical to the functioning of the share extension. e.g. in our case sending a message through the share extension on a shoddy network might fail several times in the background and be ready for another retry at second 29.

We still want to try to get it out in that last second if we can, even though it risks running afoul of the locking rules. So to some degree we've adjusted our expectations to the rules of the platform and accepted that some level of crashing is worth it if it means we'll be more likely to get messages through. We've continued to see improvements here as we've optimized our writes to be faster.

Another thing, since it sounds like you will be doing writes from the share extension, you probably want to look at: configuration.defaultTransactionKind = .immediate

groue commented 4 years ago

Thank you @michaelkirk-signal :-)

I did setup a sample app which reproduces the crash on my device (iOS 13.2.3). A simple app, without any extension. When I open an transaction (immediate so that the lock is acquired) and send the app to background, here is what I witness in the debugger:

When the app runs without debugger, Xcode is able to import a crash log which mentions 0xdead10cc:

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: Namespace RUNNINGBOARD, Code 0xdead10cc
Triggered by Thread:  0

Thread 0 name:  Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0   libsystem_kernel.dylib          0x00000001b2ef8634 mach_msg_trap + 8
1   libsystem_kernel.dylib          0x00000001b2ef7aa0 mach_msg + 72
2   CoreFoundation                  0x00000001b309f04c __CFRunLoopServiceMachPort + 216
3   CoreFoundation                  0x00000001b309a16c __CFRunLoopRun + 1444
4   CoreFoundation                  0x00000001b30998a0 CFRunLoopRunSpecific + 464
5   GraphicsServices                0x00000001bcff1328 GSEventRunModal + 104
6   UIKitCore                       0x00000001b718a740 UIApplicationMain + 1936
7   AppGroup                        0x0000000100534e9c 0x100528000 + 52892
8   libdyld.dylib                   0x00000001b2f24360 start + 4

What bugs me is that I could never have ProcessInfo.processInfo.performExpiringActivity tell that the process is about to be suspended.

groue commented 4 years ago

This note contains interesting information: https://forums.developer.apple.com/message/253232

I could log UIApplication.shared.backgroundTimeRemaining and see it decrease to 0 right before the crash. However backgroundTimeRemaining is not available to extensions, and it must be called from the main thread (so it can't be used to abort database transactions that block the main thread).

foxware00 commented 4 years ago

Many thanks, both @michaelkirk-signal and @groue for your response and investigation. I feel like I have a much better understanding of what is going on and why I am seeing it.

The short version is we've never completely eliminated these kinds of crashes.

The interesting thing is I'm running a relatively small pool of test devices. Around 8 and over the course of 24 hours can expect upwards for 30-40 crashes during background syncs. This seems awfully high for something that is and seemingly small window to hit. I feel something else might be going on. I do wonder if there is a leak of something causing the transaction not to close quickly enough.

My code around background processing is relatively simple. I just pre-fetch conversations/messages off the back of an APNS notification and from performFetchWithCompletionHandler.

For clarity, these problems are from the main app, not the app extension. I'm accepting of the fact app extensions don't have the same privileges and I can run into these issues. For this, I'm using a background URLSession which, if the app is terminated, gets picked up by the containing app to resume the session and finish sending the messages.

I could log UIApplication.shared.backgroundTimeRemaining and see it decrease to 0 right before the crash. However backgroundTimeRemaining is not available to extensions, and it must be called from the main thread (so it can't be used to abort database transactions that block the main thread).

@groue On this I wonder if we could still do something along these lines from the main app, and something more primitive if those APIs aren't available. According to the link you sent backgroundTimeRemaining is an estimate so basing it directly off that value might still show the problem, hopefully, it won't.

Are the applicationDidEnterBackground callbacks called when a background task is being executed given it never technically enters the foreground? If so can we forcefully interrupt the transaction when in the background and about to expire from the main app?

background tasks ask for but are not guaranteed any particular amount of time to continue execution. Documentation and experience say that in the usual case this is about 30s. But you can't count on it.

Interestingly I have a timeout on my background processing of around 20s or backgroundTimeRemaining - 2 which ever is smallest and I'm still seeing the crash. Leaving me to believe something is erroneously keeping the transaction open.

After all this, I'm still puzzled this only effects SQLite transactions when the DB in opened in an app container and not using a local path to the containing app.

It's also possibly small and not related but there is noexpirationHandler on the background task used to clear up memory when backgrounding application.beginBackgroundTask(expirationHandler: nil)

foxware00 commented 4 years ago

A little update, my latest release includes a little change around backgrounding transactions. Not production worthy but if the crashes drop off this could be a nice simple addition.

if application?.applicationState == .background || application?.applicationState == .inactive {
    ProcessInfo().performExpiringActivity(withReason: "transaction") { expired in
        try! pool.writeInTransaction { db in
            if expired {
                return .rollback
            } else {
                transaction(db)
                return .commit
            }
        }
    }
} else {
    try! pool.writeInTransaction { db in
        transaction(db)
        return .commit
    }
}

ProcessInfo.processInfo.performExpiringActivity always give false (not about to be suspended)

I am still intrigued by your investigation @groue and why expired wasn't being called. Interestingly performExpiringActivity work just as well from the background so not 100% sure you need to go via the backgroundTask route. One for some thought if performExpiringActivity solves the issue.

Let's wait for my guinea pigs to report back on the crashes with this change.

groue commented 4 years ago

I am still intrigued by your investigation @groue and why expired wasn't being called.

@pierlo showed me that I misread the documentation for performExpiringActivity. I was missing this second call because I wasn't actually waiting for it. I'll redo my experiments soon :-)

foxware00 commented 4 years ago

That's good to hear the API isn't broken. Its been 3 hours and counting and no crashes which is promising. I will likely leave it over the weekend to see how we're getting on.

groue commented 4 years ago

Hello all, @foxware00, @michaelkirk-signal, I can't reproduce the 0xdead10cc crash in a share extension. Generally speaking, did anyone of you witness a db-related crash in any kind of extension? If yes, do you know any reproducible scenario to produce it?

foxware00 commented 4 years ago

@groue I haven't had the 0xdead10cc crash from within the share extension. Given the way the extension hands off the URL session into the main app if it needs more time makes me think this is a little unlikely. At least to happen as much as the background syncing issues I've been facing. So, unfortunately, I have no repro steps from within an extension. I'm guessing the behaviour around killing processes from the extension are entirely different.

I think in my case, the background process way holding a transaction that does multiple things, when I'm writing to the DB from the extension it's merely updating one record which is much quicker than my backgrounded write. That combined with how much less the extension is being used compared with the background sync means I have no data at all on whether it was crashing from that state.

In an interesting development. My wrapping of the transaction in performExpiringActivity seems to be working a treat. Not a single crash over the past 36 hours. Progress. I wonder if there is something we can come up with for GRDB directly, such that transactions, when initiated from a backgrounded/inactive state, could wrap in the same way I've demonstrated above.

groue commented 4 years ago

In an interesting development. My wrapping of the transaction in performExpiringActivity seems to be working a treat. Not a single crash over the past 36 hours. Progress.

That's great: we're moving in a good direction.

I wonder if there is something we can come up with for GRDB directly, such that transactions, when initiated from a backgrounded/inactive state, could wrap in the same way I've demonstrated above.

There surely is something to do here. GRDB is "a toolkit for SQLite databases, with a focus on application development". This means that it is definitely in the scope of the lib to help avoiding 0xdead10cc.

I'm currently working on interrupting and panicking a database. A panicked database interrupts its current operations, and won't accept any further database access until the end of the panic. We'll start panicking the database when the app is about to be suspended, end end the panic when the app leaves the suspended state. This should help preventing 0xdead10cc. Instead, the application will get some DatabaseError.

State of current developments:

michaelkirk-signal commented 4 years ago

After all this, I'm still puzzled this only effects SQLite transactions when the DB in opened in an app container and not using a local path to the containing app.

To help understand why the OS terminates the process, consider what would happen if it didn't.

You'd have a suspended process (maybe the main app, maybe an extension), holding an exclusive lock on a shared file. Since the process holding the lock is suspended, it certainly can't run any code to release the lock - the file would be locked indefinitely. So what happens in the meanwhile if another process wants to use that shared file? The OS's mechanism for enforcing that files remain available in group containers is to terminate any process rather than suspend it, if it retains a lock (*in the case of sqlite files (edit: in WAL mode), only exclusive/write locks* invoke the crash).

Now consider the case of a file in the App container. You might reasonably assume the same logic applies. If the app is similarly suspended while holding the lock, again no other process could access that file. But what other process could access a file in your apps private sandbox? So what would be gained by the OS killing a process holding a lock if that process is the only one that could ever lock the file in the first place?

WAL mode creates some long lived advisory non-exclusive locks even when reading. It appears that there is a special exception iOS makes only* for sqlite files. Only exclusive (write) locks invoke the wrath of 10deadcc for sqlite databases. The reason we added the "un-encrypted header" business for sqlcipher was to expose just enough* of the database file for iOS to be able to identify it as a sqlite database and extend the more lenient version of the lock vs. suspend rules. This was the conclusion of some research done by the good sqlcipher folks. See @sjlombardo's helpful comment here: https://github.com/sqlcipher/sqlcipher/issues/255#issuecomment-355063368

groue commented 4 years ago

@michaelkirk-signal

in the case of sqlite files, only exclusive/write locks invoke the crash

That's what I was initially thinking as well. But in the DELETE journal mode (the default, and what we get with a GRDB DatabaseQueue), a simple read, which only acquires a SHARED lock according to the doc, definitely triggers 10deadcc. SHARED locks are not exclusive, but they indeed prevent writes if they are not released.

In my tests so far, only reads in the WAL journal mode (what we get with a GRDB DatabasePool) do not trigger 10deadcc.

groue commented 4 years ago

The experiments app: https://github.com/groue/10deadcc

foxware00 commented 4 years ago

Thank you @michaelkirk-signal for your breakdown that makes a lot of sense. I think my train of thought was that a transaction on the database in WAL Journal mode should be allowed to complete in suspended scenarios, which isn't true 100% of the time, similar to what you mention. It looks like we've made some real headway in understanding the problem in more depth, so thank you for taking the time to explain.

So what would be gained by the OS killing a process holding a lock if that process is the only one that could ever lock the file in the first place?

I was thinking it was a performance consideration, a stricter, cancellation and interrupt policy which was only applying to app group containers. Your explanation clarifies it greatly.

There surely is something to do here. GRDB is "a toolkit for SQLite databases, with a focus on application development". This means that it is definitely in the scope of the lib to help avoiding 0xdead10cc.

This is fantastic! Thank you, is there anything I can help with? I've had a quick look through those two PR's and look great. I need to do some more bedtime reading on SQLite locking to understand the second one better and the problems you've mentioned. Happy to test anything with my internal testers.

groue commented 4 years ago

Thanks @foxware00. Yes, working on GRDB is a great opportunity to learn many things about SQLite. Like almost everybody, I used to know nothing about this database. The technique is somewhat simple: find a funny puzzle to solve, read a lot of documentation, perform experiments, discover... stuff, and eventually ship an API that is robust and makes sense for other developers ;-)

michaelkirk-signal commented 4 years ago

In my tests so far, only reads in the WAL journal mode (what we get with a GRDB DatabasePool) do not trigger 10deadcc.

Ah, that very well could be. All of our use cases and testing were against WAL mode.

groue commented 4 years ago

Ah, that very well could be. All of our use cases and testing were against WAL mode.

I completely understand ;-) Just in case you'd use a plain dbQueue one day ;-)

foxware00 commented 4 years ago
  • [ ] Manage panic with performExpiringActivity and other iOS tools

Out of interest what did you have in mind for this? I've noticed application?.applicationState == .background should only be called from the main thread. Xcode gives me warnings about calling it from a background thread. I know transactions can be requested from any thread, thus checking whether we need to start a request for background processing and having to jump back onto the main thread to check our state seems like an issue. Interestingly none of the documentation suggests what happens if you call these methods from the foreground, but it's implied that they're background only.

groue commented 4 years ago

What I have in mind is:

  1. Forbid locking the database once background task expiration has been notified, meaning that the app may enter suspended state at any time. This means interrupting currently executed database statements (#659), and throwing an error each time the app is attempting at using the database in a way that creates a lock (#660). It's better getting an error than crashing. Keep the database in this paranoid state until...
  2. ... we authorize locking the database again, once it is safe to do so.

First item is easy.

Second item is not easy, or rather, full of shadows. For more detail: https://forums.developer.apple.com/thread/126438

foxware00 commented 4 years ago

Second item is not easy, or rather, full of shadows. For more detail: https://forums.developer.apple.com/thread/126438

Some interesting reading. Is this because you're trying to block further access to the database when a suspension is happening? It sounds like the API's are designed as wrappers around individual functionality rather than reporting the state of the system. For point 2, if we want to know when the app is allowed to authorize locks. Isn't on the successful completion of a transaction or when the background task expires? At this stage, we're being killed anyway and any other transactions wouldn't have been allowed due to the panic. And further ones would again wrap in the same background handlers. Is it this stage, the further transactions, immediately after a handler expiring that we're not notified?

I might be missing the point, however, it's a fascinating topic and lesson in learning how these things are working.

groue commented 4 years ago

Some interesting reading. Is this because you're trying to block further access to the database when a suspension is happening? It sounds like the API's are designed as wrappers around individual functionality rather than reporting the state of the system.

Welcome to the world of modern applications.

After a background task has expired, the application is not suspended yet. Many things can happen until it gets suspended for good. Core Location can notify a location update. An application timer can fire. A network request can end. All of those events may trigger a database access.

And if we would start a background task then, we would not be notified of its expiration, because it was started too late. Yes, that means 0xdead10cc.

groue commented 4 years ago

After a background task has expired, the application is not suspended yet. Many things can happen until it gets suspended for good. Core Location can notify a location update. An application timer can fire. A network request can end. All of those events may trigger a database access.

Imagine what it would mean if your app had to deal with this, with your little hands. Your app would had to setup a background task. Wait for its expiration. At that moment, your app would have to notify all other places in your application that they must no longer access the database. Cancel timers. Cancel Core Location updates. Cancel network requests. Or let them all go, but inject somehow that database must no longer be used.

... and of course this state must be lifted eventually, because your app won't block access to its database forever, will it?

Imagine the pollution in the code of your app.

Imagine what it means if you forget to handle properly the background task expiration in one place (No 0xdead10cc from timer and core location, but you forget the network request).

foxware00 commented 4 years ago

That makes a lot of sense thank you for explaining.

And if we would start a background task then, we would not be notified of its expiration, because it was started too late. Yes, that means 0xdead10cc.

This is the interesting bit, it seems like a bug on iOS's part not notifying you the second time. With performExpiringActivity states that "When your process is in the background, the method tries to take a task assertion to ensure that your block has time to execute". Sounding like in the scenario you request additional time after the first expiration it would call straight though with expired true.

Is this not what you're seeing? Or background tasks are behaving differently?

groue commented 4 years ago

This is the interesting bit, it seems like a bug on iOS's part not notifying you the second time.

It doesn't fit our need. But I'm sure Apple people will tell it behaves as expected. If you think about it, it is exactly what you intended as well with your sentence above:

It sounds like the API's are designed as wrappers around individual functionality.

The use case they want to support is exactly this one. You wrap tasks. What happens after task expiration is none of their business, and errs on the side of programmer error. What happens if an app starts a task too late is none of their business, and errs on the side of programmer error as well.

Now we are after another use case, a use case that is explicitly not supported by Apple APIs. What do we know? GRDB is surely benevolent. But maybe Apple engineers did want to prevent aggressive applications from turning the feature we are after into real dark patterns.

groue commented 4 years ago

OK we have a clear answer now: https://forums.developer.apple.com/thread/126438.

Fun fact, Core Data is not immune against 0xdead10cc: https://blog.iconfactory.com/2019/08/the-curious-case-of-the-core-data-crash/

I believe we can do better.