groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.74k stars 697 forks source link

Reading column values behaves differently depending on how a Row was fetched #504

Closed nesium closed 5 years ago

nesium commented 5 years ago

Hi there. Thanks for this great library!

I've recently got almost bitten by a behavior which I don't quite understand and which feels like a bug to me.

It was triggered by a mistake of mine, where I've migrated an existing table, adding a Double column in order to save timestamps, but set the default value of that column to Date().databaseValue which wrote a String to that new column.

I'm using RxGRDB as well and noticed that when my Record is decoded (basically trying to transform a String to a Double) by means of the Rx observation, the code crashes whereas when I fetch the record with MyRecord.fetchAll() it does not. This happens in version 3.7.0 of the library.

I understand that trying to convert the String to a Double is an error on my side, but I would expect the conversion to behave consistently, either always returning a garbage value or always crashing.

I've reduced it to a small test case:

class FetchRowVsRecordTests: GRDBTestCase {
    override func setup(_ dbWriter: DatabaseWriter) throws {
        try dbWriter.write { db in
            try db.create(table: "foo") { t in
                t.column("updated_at", .double).notNull()
            }

            let timestamp = Date(timeIntervalSince1970: 1183111200)
            print(timestamp) // 2007-06-29 10:00:00 +0000
            try db.execute(
                "INSERT INTO foo (updated_at) VALUES (?)",
                arguments: [timestamp.databaseValue]
            )
        }
    }

    func testFetchRecord() throws {
        class Foo: Record {
            var updatedAt: Double

            required init(row: Row) {
                self.updatedAt = row["updated_at"]
                super.init(row: row)
            }
        }

        let queue = try! makeDatabaseQueue()

        try queue.inDatabase { db in
            let foo = try Foo.fetchOne(db, "SELECT * FROM foo")
            XCTAssertNotNil(foo)
            print(foo!.updatedAt) // 2007.0
        }
    }

    func testFetchRow() throws {
        let queue = try! makeDatabaseQueue()

        try queue.inDatabase { db in
            let row = try Row.fetchOne(db, "SELECT * FROM foo")!
            let readTimestamp: Double = row["updated_at"] // 💣
            // never gets here
        }
    }
}

As I see it, the problem occurs in DatabaseValueConversion

static func fastDecode(from row: Row, atUncheckedIndex index: Int) -> Self {
    if let sqliteStatement = row.sqliteStatement {
        // does not crash
        return fastDecode(from: sqliteStatement, index: Int32(index))
    }
    // crashes
    return row.impl.fastDecode(Self.self, atUncheckedIndex: index)
}

Is this expected behavior?

groue commented 5 years ago

Hello @nesium,

I've recently got almost bitten by a behavior which I don't quite understand and which feels like a bug to me.

It sure feels odd 😅

Is this expected behavior?

Yes, it's how things are currently coded.

Let's say that because of the migration that mixed doubles and strings in a single column, you had a frightening glimpse into the abyss.

Can it be avoided? Yes, but I decided that it was counter-productive.


SQLite allows heterogeneous columns, and provides implicit conversions between its data types. This is a little know "feature" of SQLite, fully described at Datatypes In SQLite Version 3.

You can witness those implicit conversions with GRDB:

try dbQueue.read { db
    let a = try Int.fetchOne(db, "SELECT 20")!                // Success: 20
    let b = try Int.fetchOne(db, "SELECT 20.0")!              // Success: 20
    let c = try Int.fetchOne(db, "SELECT '20 small cigars'")! // Success: 20
}

Those implicit conversions are applied whenever GRDB has direct access to SQLite apis. They apply in the three lines above, but also when you decode records. They explain why you are able to decode your records with fetchAll.

Why doesn't GRDB check if the conversions from raw database values are "reasonable"? For performance reasons.

Given the SQLite behavior, there are three possible conversion situations:

  1. Valid: The developer decodes a reasonable type from the raw database value (int to int, int to double, int to bool...)
  2. Valid: The developer performs an "odd" conversion in full consciousness and on purpose (string to blob, int to string...)
  3. Invalid: The developer performs an "odd" conversion by mistake.

SQLite favors cases 1 and 2 with high performance APIs, and says that case 3 has to be fixed in application code or in database content (and has not to be fixed in SQLite itself).

GRDB follows the same track. I have to because I don't know how to provide a diagnostic for case 3 without ruining performance for case 1 or disallow the valid case 2.


The reason why you have a crash during Rx observation is that we can't always decode with a direct access to SQLite:

// Let's extract three database rows:
let (a, b, c) = try dbQueue.read { db
    try (Row.fetchOne(db, "SELECT 20")!,
         Row.fetchOne(db, "SELECT 20.0")!,
         Row.fetchOne(db, "SELECT '20 small cigars'")!)
}

// No direct access to SQLite
a[0] as Int // Success: 20
b[0] as Int // Success: 20
c[0] as Int // Crash!!!

The three rows above have been extracted from the database. They do not provide direct access to SQLite, but they contain copies of database values instead. Precisely speaking, they contain an Int, a Double, and a String.

Decoding is now fully performed in Swift by GRDB. And there still are implicit conversions, but not all of them:

Integer Double String Blob
Integer X X
Double X X
String X X
Blob X X

Particularly, decoding a String into a Double is not supported: this is your crash.

It is questionable GRDB ought to support the missing conversions, for three reasons:


I hope you have a better understanding of the situation!

Paranoid people can use DatabaseValue in order to perform really strict and explicit decoding of database values with full datatype information. But this is really boring, and useless in the common cases. This API is only suitable when you have to deal with a database you do not trust.

nesium commented 5 years ago

Hi @groue.

Thanks for taking the time to answer my question so thoroughly. I wasn't aware of the consequences when GRDB has direct access to SQLite or rather when it hasn't.

The reason why this was even a problem for me is because I noticed a crash in my app and wasn't immediately able to reproduce the crash in a testcase. Going forward I will update my testcase to test the path where GRDB doesn't have access to SQLite.

The testcase is rather simplistic in that I write a Record after the initial migration, perform all following migrations and try to read the record again and expect it not to crash and have the default values applied correctly (as long as they are not dynamic, like eg. the current date).

Having said that, I wouldn't wish for a reproduction of the (in my case ill-advised) SQLite conversion, but rather an emulation of the type-safe Swift model to have a consistent behavior. But sure, performance considerations come into play and not being familiar with the internals of GRDB I wouldn't want to suggest a direction to take.

Might be left to ask if there is a way to help RxGRDB to be able to decode with direct access to SQLite. Or is this not possible?

groue commented 5 years ago

Hi @groue.

Thanks for taking the time to answer my question so thoroughly. I wasn't aware of the consequences when GRDB has direct access to SQLite or rather when it hasn't.

This is indeed not publicly documented, because one very rarely meets those implementation details. This is not hidden, though: you took a bite from the apple of knowledge 😉

The reason why this was even a problem for me is because I noticed a crash in my app and wasn't immediately able to reproduce the crash in a testcase. Going forward I will update my testcase to test the path where GRDB doesn't have access to SQLite.

The testcase is rather simplistic in that I write a Record after the initial migration, perform all following migrations and try to read the record again and expect it not to crash and have the default values applied correctly (as long as they are not dynamic, like eg. the current date).

Having said that, I wouldn't wish for a reproduction of the (in my case ill-advised) SQLite conversion, but rather an emulation of the type-safe Swift model to have a consistent behavior. But sure, performance considerations come into play and not being familiar with the internals of GRDB I wouldn't want to suggest a direction to take.

I understand that this inconsistency is not ideal. I have tried to explain why it is here. But this does not mean we can't search for some way to avoid it in the future. What is important to me is that performances in production for bug-free apps are not impacted.

Might be left to ask if there is a way to help RxGRDB to be able to decode with direct access to SQLite. Or is this not possible?

This is possible but not what we want.

RxGRDB does not notify consecutive identical values. This means that it compares values (the old one vs. the new one) in order to discard duplicates. The comparison is not performed on record objects (which are not required to adopt the Equatable protocol). Instead, the comparison is performed on database rows (which are Equatable for free). Later on, after direct database access has been lost, new rows are decoded into records.

If records were directly decoded, we'd have two decodings: one into record properties (for app consumption), and one into DatabaseValue (so that we can perform comparisons). This would be much less efficient. Besides, decoding records without direct database access is a nicety of RxGRDB which does not lock a database connection longer than necessary. It just quickly fetches raw rows, releases database access as soon as possible, and performs row comparison and decoding in some private background queue. This does help performances of multi-threaded applications.

Maybe we could avoid SQLite conversion, and always perform GRDB conversion, so that we have a unique path and consistent results?

This slow path could be activated in debug builds? But this would not prevent your bug from happening in production.

This slow path could be activated with a configuration flag? But nobody would use it, because nobody expects such bug to happen, and thus would not protect themselves.

I don't quite see any solution right now, @nesium. GRDB already prevents a heck a lot of application bugs - not all of them, not yours.

nesium commented 5 years ago

Thanks again, @groue for the explanation. That helped improve my understanding of both libraries.

So a bit of loud thinking…

What do you think of introducing eg. a SQLiteBackedRow in addition to Row? There wouldn't be a real win, but it would expose the leak of the abstraction and make it clearer what happens under the hood. It would help to debug such an issue, as the behavior could be documented in those classes.

With regards to the configuration flag, this would help making behave intermediate libraries (RxGRDB) consistently where one has no means of intercepting the fetching, reading and comparing of values from the database. Currently GRDB runs with -Ounchecked if you will. So that concept would be familiar to people of the Swift community?

groue commented 5 years ago

Thanks again, @groue for the explanation. That helped improve my understanding of both libraries.

So a bit of loud thinking…

Yes, please :-)

What do you think of introducing eg. a SQLiteBackedRow in addition to Row?

I agree that Row is a weird type, with multiple facets, and various behaviors. And its behavior is not fully consistent.

GRDB has a unique Row type right from the beginning. But it didn't optimize for direct SQLite access until I attempted at fixing the bad performance of early GRDB releases. I remember feeling like I was hitting a red nuclear button when I shipped v0.17, the one which introduced the inconsistency we are talking about. I was fearing many bugs would surge.

But eventually several years have passed, and nobody died (your app is the first).

The unique Row type helps the library expose a remarkably uniform api surface, regardless of the data types you are fetching or observing in the database. Values, records, raw rows... Very few distinct apis rule them all:

Player.fetchOne(...) // Player?
Player.fetchAll(...) // [Player]
Player.fetchCursor(...) // Cursor of Player

String.fetchOne(...) // String?
String.fetchAll(...) // [String]
String.fetchCursor(...) // Cursor of String

Row.fetchOne(...) // Row? (DatabaseValueBackedRow)
Row.fetchAll(...) // [Row] (DatabaseValueBackedRow)
Row.fetchCursor(...) // Cursor of Row (SQLiteBackedRow)

let request = Player.all()
request.fetchOne(...) // Player?
request.fetchAll(...) // [Player]
request.fetchCursor(...) // Cursor of Player
request.rx.fetchAll(...).subscribe(onNext: { players in // [Player]
    ...
})

let request = Player.all().select(Column("name"), as: String.self)
request.fetchOne(...) // String?
request.fetchAll(...) // [String]
request.fetchCursor(...) // Cursor of String
request.rx.fetchAll(...).subscribe(onNext: { strings in // [String]
    ...
})

let request = Player.all().asRequest(of: Row.self)
request.fetchOne(...) // Row? (DatabaseValueBackedRow)
request.fetchAll(...) // [Row] (DatabaseValueBackedRow)
request.fetchCursor(...) // Cursor of Row (SQLiteBackedRow)
request.rx.fetchAll(...).subscribe(onNext: { rows in // [Row] (DatabaseValueBackedRow)
    ...
})

This is very good for learnability.

The unique Row type also allows FetchableRecord.init(row:) to accept all rows:

struct Player: FetchableRecord {
    init(row: Row) { ... }
}

let player = try Player.fetchOne(...) // Decodes SQLiteBackedRow
let row = try Row.fetchOne(...)
let player = Player(row: row) // Decodes DatabaseValueBackedRow
let player = Player(row: ["id": 1, "name": "Arthur"]) // Decodes DictionaryBackedRow

This is also good for learnability. And it is needed by delayed decoding used internally by observation tools, as we have discussed above.

Public GRDB concepts are as simple as possible. And implementation is as complex as it needs to be (hopefully not too much).

There wouldn't be a real win, but it would expose the leak of the abstraction and make it clearer what happens under the hood. It would help to debug such an issue, as the behavior could be documented in those classes.

I agree that Row leaks (in applications that suffer from bugs similar to yours), and that your debugging story wasn't easy.

But introducing SQLiteBackedRow would make the API more complex.

And introducing SQLiteBackedRow would not bring any benefit to bug-free apps, because it would have exactly the same behavior as other row types.

This is a net loss. I'm not sold yet to the idea that everybody needs to know what happens under the hood. You have live perfectly well without this information until this misstep.

I'm not closing the door: I just haven't seen a suitable door to open yet.

With regards to the configuration flag, this would help making behave intermediate libraries (RxGRDB) consistently where one has no means of intercepting the fetching, reading and comparing of values from the database. Currently GRDB runs with -Ounchecked if you will. So that concept would be familiar to people of the Swift community?

Do you suggest that optimized direct SQLite access would be only activated with the -Ounchecked flag? Ha, ha :-) No.

Seriously, I understand that you feel frustrated, and that this crash was a very bad news, and that it took you maybe several days until you figured it out. I do understand, really. It happens to me all the time.

But I can't rip off years of careful design because of one application bug. And it's not quite fair to hold GRDB responsible for the incorrect migration in the first place.

Remember: this library is based on SQLite. One of the most successful libraries on Earth. I respectfully DO NOT FIGHT such a tool. The goal of GRDB is to make SQLite easier to use. Not to redefine how it should work.

It happens that SQLite ships with all those weird conversion rules. As if those weird conversion rules were more important than dealing with a few applications that misuse them. Isn't it interesting to at least wonder a few minutes why they chose such a design?


Oh, a new idea!

Add checks to your tables, so that SQLite prevents storing wrong types in:

let dbQueue = DatabaseQueue()
try dbQueue.inDatabase { db in
    try db.create(table: "player") { t in
        t.column("id", .integer).primaryKey()
        t.column("name", .text).check(sql: "TYPEOF(name) == 'text'")
        t.column("score", .integer).check(sql: "TYPEOF(score) == 'integer'")
    }

    // OK
    try db.execute(
        "INSERT INTO player (name, score) VALUES (?, ?)",
        arguments: ["Arthur", 1])
    // Error
    try db.execute(
        "INSERT INTO player (name, score) VALUES (?, ?)",
        arguments: ["Arthur", "Foo"])
    // Error
    try db.execute(
        "INSERT INTO player (name, score) VALUES (?, ?)",
        arguments: [1, 2])
}
nesium commented 5 years ago

No hard feelings, I'm not even frustrated. But I wanted to surface the problem and see if it might be an honest bug.

I also don't want you to complicate the API and I'm very happy with and thankful for the design you did with the library, which is why I'm writing here in the first place. Also I don't hold GRDB responsible for the problem, but am merely pondering if there might be a solution to take, what I experienced as, a sharp edge off of the library.

I was seeing the SQLiteBackedRow not really as a user-facing feature. In Objective-C Row could have been a class-cluster. From the outside you're always interacting with NSString but you might be handed a NSConcreteString or what have you. You'd only notice while debugging.

The check looks really cool and it would only affect insertion performance.

Thanks again for your work on this project and I appreciate the discussion!

groue commented 5 years ago

I appreciate it too, @nesium.

You opened my eyes on the fact that what I used to call "untrusted databases" (not to say "rogue databases"), are not only the consequence of a sloppy db management. They may be the consequence of an "honest" bug.

And thanks for that. You have shed some light on a blind spot.

You currently pay a rather radical decision, which has made debugging difficult. I suspect the actual crash to contain a very explicit message like Could not convert "2019-03-21 16:01:36.123" to Double. But since the crash happens during late decoding, I'm not sure the source SQL is printed in the error message: it becomes very difficult to spot the origin of the error. #384 did greatly improve debugging, but there are still corner cases.

Some other people have tried another path, and have asked for records that can throw during decoding, instead of crashing: #437. You'll see that I also declined the feature request, and provided yet another pool of alternative solutions.

None of those are good answers to your case. You do not want a new feature. Ideally, you'd keep the same setup, but the bug would have been much more quickly recognized and fixed.

I'm sure we'll eventually figure out the best option. I don't intend to sound negative: it's just that I know that a good solution will need more work. Meanwhile, an issue like this one is very very precious. Please never hesitate chiming in again.