simolus3 / drift

Drift is an easy to use, reactive, typesafe persistence library for Dart & Flutter.
https://drift.simonbinder.eu/
MIT License
2.62k stars 365 forks source link

SQLite - Migrator Bug? Regression from 2.18.0 => 2.19.x when adding nullable columns to table with data #3112

Closed dballance closed 2 months ago

dballance commented 3 months ago

Background

I'm in the process of migrating a codebase from 2.18.0 to 2.19.x (2.19.1+1 specifically).

We run a matrix of migration tests - effectively testing every schema version and it's upgrade to to all future schema versions. For example, if the current schema is 5, we generate tests which test migration from 1:2, 1:3, 1:4, 1:5. For schema 2, we test 2:3, 2:4, 2:5, and so on for all schema versions.

We use step-by-step migrations - and I previously added documentation showing how to migrate to step-by-step if you started without it. Our migration implementation follows this guidance.

On 2.18.0, our migration tests run without issue. They run in both our CI environment running Ubuntu w/ SQLite 3.37.2, and on MacOS running SQLite 3.43.2.

Starting in 2.19.x, our migration tests fail, but not in all environments.

I'm trying to pin down WHAT changed and WHERE - but I'm fairly certain at this point the change originating these differences is somewhere in drift.

I'm going to use this issue as I investigate to see if I can narrow down where the issue lies.

Behavior

There were 2 behavior changes that I noticed.

  1. String literals MUST now be escaped with single quotes ' vs double quotes ".
  2. Adding a NOT NULL column migration seems to fail, but not in all cases and not on all platforms.

SQL String literal escape behavior change

We have a table defined in drift files as:

  id    TEXT    NOT NULL    PRIMARY KEY,
  foo   TEXT    NOT NULL    DEFAULT 'bar' MAPPED BY `const SomeTypeConverter()`,

We previously used the below to migrate.

await db.transaction(() async {
  // Create table
  await migrator.createTable(schema.table);

  // Insert into table
  await db.customStatement('''
    INSERT INTO table (id, foo) 
    VALUES ("0", "bar") ON CONFLICT (id) DO NOTHING
  ''');
});

This works fine in 2.18.0, but fails in 2.19.1+1 with:

SqliteException (SqliteException(1): while executing, no such column: 0, SQL logic error (code 1)
  Causing statement:             INSERT INTO table (id, foo) 
            VALUES ("0", "bar") ON CONFLICT (id) DO NOTHING
              , parameters: )

After some noodling - it seems we had incorrect string escapement in our SQL for string literals. We modified the statement to use single quotes and this resolved the issue for us.

await db.transaction(() async {
  // Create table
  await migrator.createTable(schema.table);

  // Insert into table
  await db.customStatement('''
    INSERT INTO table (id, foo) 
    VALUES ('0', 'bar') ON CONFLICT (id) DO NOTHING
  ''');
});

This seems fair to me, as it's really an issue with the underlying SQL statement.

NOT NULL column behavior change

The more nefarious issue seems to be a behavior change in drift that correlates to some specific SQL versions.

In 2.18.0, we have a table that adds a column in a migration. The column is defined in our drift file below:

  foo              BOOLEAN NOT NULL    CHECK (foo IN (0, 1))   DEFAULT 1

And we migrate this column with:

await db.transaction(() async {
  await migrator.addColumn(
      schema.table, schema.table.foo);
});

I will also note that this table, at time of migration, WILL have one row in it. It's configuration data that is seeded in a previous migration (and on CREATE for a new database).

in 2.18.0, in ALL environments, this succeeds.

in 2.19.0, this succeeds in environments running SQLite 3.43.2, but fails in the CI environment running SQLite 3.37.2.

Digging some on SQLite, I see that in 3.37.0, there IS a discussion around this issue in SQLite 3.37.0 release notes (https://www.sqlite.org/changes.html).

When adding columns that contain a [CHECK constraint](https://www.sqlite.org/lang_createtable.html#ckconst) or a [generated column](https://www.sqlite.org/gencol.html) containing a [NOT NULL constraint](https://www.sqlite.org/lang_createtable.html#notnullconst), the [ALTER TABLE ADD COLUMN](https://www.sqlite.org/lang_altertable.html#altertabaddcol) now checks new constraints against preexisting rows in the database and will only proceed if no constraints are violated.

However my questions are:

dballance commented 3 months ago

I managed to work around this in our code by manually migrating without the CHECK constraint, then applying a full table migration to pick up the constraints in the drift file.

await db.transaction(() async {

    await db.customStatement('''
        ALTER TABLE "table" ADD COLUMN "foo" INTEGER NOT NULL DEFAULT 1;
    ''');

    await db.customStatement('''
        UPDATE "table" SET "foo" = 1 WHERE "foo" IS NULL OR "foo" NOT IN (0, 1);
    ''');

    await migrator.alterTable(TableMigration(schema.table));
});

After this - the table is successfully migrated and validated by the migration test.

Why is this needed in 2.19.x and not in 2.18.0? Further, why is it only needed for SQLite 3.37.2 and not 3.43.2?

dballance commented 3 months ago

Once I got past the changes above to migrate the boolean column, I then ran into the same issue on an enum column added in a future migration on the same table.

In the drift file:

  baz       INTEGER   NOT NULL    CHECK (baz IN (0, 1, 2, 3, 4)) DEFAULT 2 MAPPED BY `const BazConverter()`

The migration:

await db.transaction(() async {
    await migrator.addColumn(schema.table, schema.table.baz);
});

This fails, likely due to the same issue above (check constraint), however, these test pass on 2.18.0.

Gonna dig into changes between 2.18.0 and latest to see if I can pinpoint what might be causing this.

simolus3 commented 2 months ago

Thanks for the detailed report!

The double-quoted strings behavior was changed in fde04bb154aeaeb9eaf03d196df62085478fd179 to be consistent with the feature being disabled in the builds you get from sqlite3_flutter_libs. I see that this is a potentially breaking change, but it was done as a bugfix to make the environment unit tests run in more consistent with the actual behavior Flutter apps would get. Without it, it's possible for migrations succeeding in unit tests but failing in actual apps.

The change around CHECK constraints looks puzzling to me as well, I'm not aware of anything that should have changed here. If you're already taking a look, perhaps its useful to enable statement logging in migration tests:

diff --git a/drift_dev/lib/src/services/schema/verifier_impl.dart b/drift_dev/lib/src/services/schema/verifier_impl.dart
index fe85f842..919c0777 100644
--- a/drift_dev/lib/src/services/schema/verifier_impl.dart
+++ b/drift_dev/lib/src/services/schema/verifier_impl.dart
@@ -92,7 +92,7 @@ class VerifierImplementation implements SchemaVerifier {

     return InitializedSchema(dbForUse, () {
       final db = _setupDatabase(uri);
-      return DatabaseConnection(NativeDatabase.opened(db));
+      return DatabaseConnection(NativeDatabase.opened(db, logStatements: true));
     });
   }

Is there any difference in the generated ALTER TABLE statement between drift 2.18 and 2.19?

dballance commented 2 months ago

The double-quoted strings behavior was changed in fde04bb to be consistent with the feature being disabled in the builds you get from sqlite3_flutter_libs. I see that this is a potentially breaking change, but it was done as a bugfix to make the environment unit tests run in more consistent with the actual behavior Flutter apps would get. Without it, it's possible for migrations succeeding in unit tests but failing in actual apps.

Yep - I'm all good with this change. We updated the two spots where we had issues. Agree with the intent to ensure consistency.

I should note, we are using sqlcipher_flutter_libs in our apps.

The change around CHECK constraints looks puzzling to me as well, I'm not aware of anything that should have changed here. If you're already taking a look, perhaps its useful to enable statement logging in migration tests:

diff --git a/drift_dev/lib/src/services/schema/verifier_impl.dart b/drift_dev/lib/src/services/schema/verifier_impl.dart
index fe85f842..919c0777 100644
--- a/drift_dev/lib/src/services/schema/verifier_impl.dart
+++ b/drift_dev/lib/src/services/schema/verifier_impl.dart
@@ -92,7 +92,7 @@ class VerifierImplementation implements SchemaVerifier {

     return InitializedSchema(dbForUse, () {
       final db = _setupDatabase(uri);
-      return DatabaseConnection(NativeDatabase.opened(db));
+      return DatabaseConnection(NativeDatabase.opened(db, logStatements: true));
     });
   }

Is there any difference in the generated ALTER TABLE statement between drift 2.18 and 2.19?

No real difference here. I see the changes I made related to the above escapement changes, but that's it. Biggest difference is SAVEPOINT s0 vs SAVEPOINT s1.

The ALTER TABLE statements are identical.

2.18.0
ALTER TABLE "table" ADD COLUMN "foo" INTEGER NOT NULL CHECK (foo IN (0, 1)) DEFAULT 1; with args []

2.19.1
ALTER TABLE "table" ADD COLUMN "foo" INTEGER NOT NULL CHECK (foo IN (0, 1)) DEFAULT 1; with args []
Sanitized result of diffing the output of `logStatements: true` ``` 7c7 < SAVEPOINT s0 with args [] --- > SAVEPOINT s1 with args [] 13c13 < RELEASE s0 with args [] --- > RELEASE s1 with args [] 49,50c49,50 < INSERT INTO table (id, a) < VALUES ("0", "b") ON CONFLICT (id) DO NOTHING --- > INSERT INTO table (id, a) > VALUES ('0', 'b') ON CONFLICT (id) DO NOTHING 57,58c57,58 < INSERT INTO table_2 (type, a, b) < VALUES ("A", 0, 1) --- > INSERT INTO table_2 (type, a, b) > VALUES ('A', 0, 1) 61,62c61,62 < INSERT INTO table_2 (type, a, b) < VALUES ("B", 1, 1) --- > INSERT INTO table_2 (type, a, b) > VALUES ('B', 1, 1) 72,77c72,77 < INSERT INTO table_3 < ( < id, < a, < b, < c, --- > INSERT INTO table_3 > ( > id, > a, > b, > c, 83,85c83,85 < INSERT INTO table_4 < ( < id, --- > INSERT INTO table_4 > ( > id, 106c106 < SAVEPOINT s0 with args [] --- > SAVEPOINT s1 with args [] 116c116 < RELEASE s0 with args [] --- > RELEASE s1 with args [] ```

I generated this by running a test from our first supported "stepwise" migration version to our latest version. The only difference between these two logs is changing the pubspec versions.

test('migrates from min to max', () async {
  final connection = await verifier.startAt(min);
  final db = MyDatabase.withConnection(connection);
  addTearDown(db.close);

  // Use this to run a migration and then validate that the database
  // has the expected schema.
  await verifier.migrateAndValidate(db, max);
});
  dependencies:
    ...
    drift: ^2.18.0
    drift: ^2.19.1+1

  dev_dependencies:
    ...
    drift_dev: ^2.18.0
    drift_dev: ^2.19.1
dballance commented 2 months ago

Wanted to add this in a separate comment. This almost feels like an underlying lib change - like maybe a change in sqlite3_flutter_libs that bumped the query executor version to one that is beyond 3.37.0?

I'm definitely not a SQLite expert, this is just me trying to reason through the change. I do know that nothing changed in the migration executor for addColumn between these versions - so the query that is executed SHOULD be the exact same. This is why I'm leaning towards some lower level query / binary version change.

The SQLite docs on 3.37.0 almost explicitly states that this type of migration will fail if the column is NOT NULL and contains a CHECK constraint and being added to a table containing data. In my case, this is the exact table where this fails, and I have other addColumn calls adding NOT NULL columns that pass, because the table in this case doesn't have data. However, this is concerning, because it means addColumn could FAIL if adding a NOT NULL column with a CHECK constraint to some table that has data in a production device. In other words, a newly created, empty database would pass tests on an empty database, but it would fail in prod if a device had data in the table. If this is the case, it may be that the addColumn migrator method needs to be expanded to apply the migration more like the workaround described above in cases where both NOT NULL and CHECK are in the column definition. Something like addColumn(TableInfo table, GeneratedColumn column, {DriftAny defaultValue}) OR addColumnWithConstraints(TableInfo table, GeneratedColumn column, DriftAny defaultValue) maybe.

Potentially, this could explain the issue with the environments. I'm going to check the state of the database during the test in both environments to see if there is a difference in the actual content in the database. It could be that locally, for whatever reason, the database table is empty so the migration succeeds. This especially could be the case when running matrix migrations, as some of my migrations seed data in the same transaction, and if they're skipped then the data will not appear in the database under test, I think.

dballance commented 2 months ago

SQLite docs on 3.37.0.

Thinking about this, it would mean that simple things like adding an enum to the database on some table and including a check constraint (like my comment above) could fail, and the addColumn migrator method would need to handle this OR at the very least some documentation to help people understand why addColumn may not always be the correct choice.

I use drift files and craft the SQL tables myself, so this could be part of the issue. Generated schemas for enums may not use CHECK to constrain enums.

dballance commented 2 months ago

I iterated through the changesets, and the error DID start at the DQS commit.

https://github.com/simolus3/drift/commit/fde04bb154aeaeb9eaf03d196df62085478fd179

dballance commented 2 months ago

Yes, the tests successfully run in 9a22adf but fail in fde04bb

simolus3 commented 2 months ago

Thanks for taking the time to investigate this. It's very weird that disabling DQS would cause this.

I wonder if sqlite 3.37.0 added the CHECK constraint validation for existing rows but didn't consider the default constraint and then later sqlite3 versions fixed that? But still, this being related to DQS is bizarre.

dballance commented 2 months ago

yeah, I'm trying to dig to see if I can get some form of my insert to validate on the DQS commit. Might point us to what is going on.

simolus3 commented 2 months ago

I wonder if sqlite 3.37.0 added the CHECK constraint validation for existing rows but didn't consider the default constraint and then later sqlite3 versions fixed that? But still, this being related to DQS is bizarre.

Huh, I think that's it! I've built a sqlite 3.37.2 CLI and

sqlite> create table foo (bar text);
sqlite> insert into foo values ('existing row');
sqlite> alter table foo add column baz not null check (baz < 10);
Error: stepping, Cannot add a NOT NULL column with default value NULL (1)
sqlite> alter table foo add column baz not null check (baz < 10) default 10;
Error: stepping, CHECK constraint failed (1)
sqlite> alter table foo add column baz not null check (baz < 10) default 9;
sqlite> .q

sqlite> .dbconfig dqs_ddl off
            dqs_ddl off
sqlite> .dbconfig dqs_dml off
            dqs_dml off
sqlite> create table foo (bar text);
sqlite> insert into foo values ('existing row');
sqlite> alter table foo add column baz not null check (baz < 10);
Error: in prepare, SQL logic error (1)
sqlite> alter table foo add column baz not null check (baz < 10) default 10;
Error: in prepare, SQL logic error (1)
sqlite> alter table foo add column baz not null check (baz < 10) default 9;
Error: in prepare, SQL logic error (1)
dballance commented 2 months ago

I'm a neophyte here on the DQS stuff - what does the above mean?

Somehow the DQS off changes the validation on the ALTER TABLE?

Seems very... uh... dangerous?

dballance commented 2 months ago

I think maybe this also means that with DQS off, addColumn needs more docs OR a separate API to handle these cases?

I would thinking adding an enum column with a CHECK constraint would be common.

simolus3 commented 2 months ago

Revisiting that, I remember that I have actually reported that particular bug to the SQLite maintainers.

Unfortunately, I think the resolution in the end is that sqlite 3.37 is just broken in that regard? Given that it's fixed in subsequent versions I don't think it's appropriate to add a fallback for the only version where that would happen.

dballance commented 2 months ago

Ahh, I understand - so it was a documented change in 3.37.0, but had a bug around the default value that was not fixed until a future SQLite version. Makes me feel much better, because calls without a DEFAULT while setting NOT NULL on addColumn WOULD fail, which is what we want.

dballance commented 2 months ago

I think this can be closed given the above. Greatly appreciate your help here - does seem like SQLite 3.37.2 is just broken with DQS_DML set to false.

simolus3 commented 2 months ago

Yes. So, to summarize:

What I haven't considered is that disabling double-quoted string literals is actually a regression for users of sqlcipher_flutter_libs, because those libraries are not compiled with double-quoted strings disabled. Testing with them disabled and then running these queries on a database that still has them enabled is safe if your queries are valid. Broken queries that accidentally do SELECT "missing_column" FROM tbl are problematic because they return nonsense instead of failing. So, for safety, I recommend disabling double-quoted string literals manually if you're using sqlcipher_flutter_libs. I have documented that in c2b5e42b3174a65af58456959e309e202248c8d6.

bdlindsay commented 3 weeks ago

Appreciate all the detail here. I actually ran into a very similar issue not using sqlcipher_flutter_libs and on sqlite3_flutter_libs: 0.5.24 that looks like it's past the problematic sqlite versions, again just on CI test runs.

I'm able to use the workaround dballance mentioned to fix failing migration tests in our CI though 👍. I took the approach of attempting the addColumn during the migration step and falling back to the custom statements if that addColumn call throws.