groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.78k stars 702 forks source link

Reusing a database with costly migrations #1563

Closed anlaital-oura closed 3 months ago

anlaital-oura commented 3 months ago

What did you do?

We have databases that have a lot of migrations. Running the migrations takes ~400ms on an M2 Max for each database. This is a problem in unit tests because these migrations for each database get run almost two thousand times, making the tests sluggish.

What did you expect to happen?

I would like to be able to run the costly migrations once and then use the newly migrated database as a clean slate for each test case. However, I couldn't figure out a good and performant way to do this with GRDB.

What happened instead?

N/A

Environment

GRDB flavor(s): GRDB GRDB version: 6.27.0 Installation method: SPM Xcode version: 15.4 Swift version: 5.10 Platform(s) running GRDB: iOS macOS version running Xcode: 14.5

Demo Project

N/A

groue commented 3 months ago

Hello, @anlaital-oura,

Please check the Foreign Key Checks section of the migrations documentation.

It explains the reason why migrations can take time, and gives some mitigations techniques.

Take care that the mitigation techniques requires some care, because the application becomes responsible for the integrity of its database. Please ask a question if the documentation leaves some unanswered question.

groue commented 3 months ago

Oops, I missed this other sentence of yours:

I would like to be able to run the costly migrations once and then use the newly migrated database as a clean slate for each test case. However, I couldn't figure out a good and performant way to do this with GRDB.

So my previous answer is more about reducing the duration of migrations for everybody, tests, but also application users. Let's focus on tests and answer your question below.

One technique you could use is to:

  1. Each test looks for a database at some location on disk. If it does not exist, it creates an empty database. In all cases, it migrates it with the eraseDatabaseOnSchemaChange option.

    This database is the template for other tests.

    Take care that this setup must not run in parallel from multiple tests, including tests that run from multiple processes, so you probably need to use a NSFileCoordinator or another file-system-level synchronization technique.

    The template database should probably not be committed in your git repo. Thanks to the eraseDatabaseOnSchemaChange options, you shouldn't have to delete it whenever you modify a migration during development.

  2. Now each test should work on a private copy of the template database, so that the test can not alter the behavior of other tests. You have multiple options for copying databases.

    If your tests can run with a DatabaseQueue, you will enjoy one of those:

    If your tests require a DatabasePool, then any database copy technique will work: vacuum(into:), or backup(to:).

anlaital-oura commented 3 months ago

I think using DatabaseQueue.inMemoryCopy(fromPath:configuration:) has a lot of promise for our use case. Thank you. I will try implementing this in our tests to speed them up. 🙏

I noticed that if I have an in-memory database that is migrated and when I call backup(to:) on it to another clean database (without any migrations in the new database), then the migrations are not copied. I assume it only copies data and not the empty table definitions resulting from the migrations themselves, meaning that the database that is being copied to must also have compatible schema defined already?

groue commented 3 months ago

I noticed that if I have an in-memory database that is migrated and when I call backup(to:) on it to another clean database (without any migrations in the new database), then the migrations are not copied. I assume it only copies data and not the empty table definitions resulting from the migrations themselves, meaning that the database that is being copied to must also have compatible schema defined already?

I'm not sure I understand, because an SQLite backup performs a full copy, replacing any original contents of the target database. In particular, SQLite backup is not aware of GRDB migrations, so they are copied with everything else.

anlaital-oura commented 3 months ago

Hmm, this is very strange then indeed! I implemented the backup(to:) approach earlier and in subsequent tests I started getting an error about tables not existing in the newly-minted backup. I can investigate this further if indeed backup(to:) should also copy the full schema, including migrations and the (empty) tables that are created as a result of those migrations.

groue commented 3 months ago

I think using DatabaseQueue.inMemoryCopy(fromPath:configuration:) has a lot of promise for our use case. Thank you. I will try implementing this in our tests to speed them up. 🙏

Now that I think more about it, maybe your tests could share an in-memory template database, and copy its content in a new in-memory database with backup. This would avoid all the file-system fuss I described.

groue commented 3 months ago

Hmm, this is very strange then indeed! I implemented the backup(to:) approach earlier and in subsequent tests I started getting an error about tables not existing in the newly-minted backup. I can investigate this further if indeed backup(to:) should also copy the full schema, including migrations and the (empty) tables that are created as a result of those migrations.

Yes. Something is fishy here. I wouldn't expect SQLite backup to be buggy, so maybe you found a GRDB bug, or there is a bug in your tests. Please report the results of your investigations when you have time! 👍

anlaital-oura commented 3 months ago

Now that I think more about it, maybe your tests could share an in-memory template database, and copy its content in a new in-memory database with backup.

This is exactly the approach I originally tried to use, but then our tests started failing due to tables being missing in the database that was the target of backup(to:). I will try investigating this further as it seems that it should work as I originally envisioned it. Most likely our bug since so far GRDB has been rock solid and I don't believe it for a second that we found a bug in SQLite itself. 😭

groue commented 3 months ago

OK @anlaital-oura, so it looks like the topic of the issue is shifting 🙂 Maybe share the technique used by the failing tests that backup an in-memory template database?

anlaital-oura commented 3 months ago

I agree. Now that I implemented the original idea I had again, it seems to work flawlessly! I must've messed something up earlier. 😅 Test suite execution time went from 262s to 154s with this optimization, so that's not a bad improvement at all! 🥳

Thank you once more @groue for your help and your amazing work on GRDB!

Here is the overall approach for posterity that I used in case someone else encounters a similar issue:

@MainActor private var migratedDatabaseCache = [String: DatabaseWriter]()

final class SomePseudoTestDatabaseHandler {

    @MainActor init() throws {
        database = try DatabaseQueue()

        let key = String("\(type(of: self))") // There are multiple different types of databases that we need to work with.
        if let cachedDatabase = migratedDatabaseCache[key] {
            try cachedDatabase.backup(to: database)
            return
        }

        let cachedDatabase = try DatabaseQueue()
        try Adapter.migrator().migrate(cachedDatabase)
        migratedDatabaseCache[key] = cachedDatabase

        try cachedDatabase.backup(to: database)
    }
}

I am closing this as the issue is fully resolved. Thanks again!

groue commented 3 months ago

That's much better indeed! Thank you @anlaital-oura for sharing this experience and exploration 👍 Happy GRDB!