Using RowConvertible et al with a commitment to unique objects

sobri909 commented 6 years ago

Hi Loving GRDB so far :) Thanks for the work you've done on it!

I'm in the process of adding a persistence layer to ArcKit. I'd initially planned on building a custom ORM with no dependencies other than SQLite. But dealing directly with the SQLite C interface would require a lot of time consuming wheel reinvention, so I've been depending on GRDB for the basic db layer needs.

I'm getting close to finished, but the more time I spend with GRDB, the more comfortable I feel with having it as a long term dependency. So I'm looking into consuming more of its protocols and potentially ditching my equivalent reinventions. The catch is ArcKit is offering a quite different contract. ArcKit's persistence layer provides unique, mutable, lazy loading, thread safe instances.

The first candidate where I'd like to ditch my reinvention is RowConvertible. But I would need the fetch methods to go through my own persistence layer, to potentially return the existing instance. It looks like I could achieve this by providing my own RecordCursor.next(), although RecordCursor is final and I'm not immediately seeing a way to inject a replacement.

I think all I need inside next() is a call out to something along the lines of store.instanceFor(row), with store being my own cache/factory, that can then return either an existing instance or create a new one.

But I'm guessing I might also run into similar hurdles with TableMapping and Persistable? So I'm uncertain whether this is a wise path to go down or not. I'd like to ditch a bunch of my ORM reinventions and hand those responsibilities to GRDB, but am keen to hear your thoughts on whether the contract differences will end up being fatal blockers or not.

Thanks!

groue commented 6 years ago

Hello @sobri909.

GRDB built-in records are unmanaged, can cross threads, foster immutability, do not auto-update, are not uniqued, and don't do any lazy fetching.

But GRDB does not want to prevent managed, uniqued, lazy loading records. It just happens that this use case hasn't been put on the table yet. So I'll try to answer you questions the best I can.

Give me a little time.

groue commented 6 years ago

(and thanks for the compliments and your trust 😉)

sobri909 commented 6 years ago

Thanks :) It's precisely because of how well GRDB manages records / transactions / etc across threads that I chose it. It appears to have the best model for my multi threaded needs.

My gut instinct is that the only serious blocker is at record creation time. As long as all record creation goes through a persistent store, uniqueness can be assured. (My own persistence layer already manages this, as long as all instance creation and fetching goes through it. But it's forcing the creation of a parallel world of protocols/stores/etc outside of GRDB.)

For the thread safety of the mutable instances once they exist, that's my responsibility to provide, given that I've been foolish enough to commit to it ;) But I'm uncertain what other hurdles I might run into at the GRDB level, if I adopt more protocols.

Change tracking is a big one I'd like to gain from GRDB. I was just about to start writing it into my own layer today, but I saw your commits the other day about RecordBox, and it feels like there's almost enough common ground to make it happen.

sobri909 commented 6 years ago

Here's my current PersistentTimelineStore class.

The two key examples of what I need to be able to do are:

So an attempt to fetch a sample from the store will return either the existing instance or a fresh instance, by virtue of all sample instance creation going through the PersistentStore.sample(for row: Row) method. All instance creation and fetching goes through the store, to ensure uniqueness.

groue commented 6 years ago

OK, let's shed some light.

First of all, you have well understood that the word "record" is quite imprecise. What really exists are the three protocols RowConvertible, TableMapping, and Persistable.

Those protocols have core requirements, on top of which GRDB provides plenty of convenience methods.

To paraphrase the Record Protocols Overview:

RowConvertible has init(row:):

protocol RowConvertible {
    init(row: Row)
}

// Convenience methods
try Type.fetchAll(db, "SELECT * FROM ...")
try Type.fetchCursor(db, "SELECT * FROM ...")
try SQLRequest("SELECT * FROM ...").asRequest(of: Type.self).fetchAll(db)

TableMapping knows how to build SQL:

protocol TableMapping {
  static var databaseTableName: String { get }
}

// Convenience methods
Type.all()       // QueryInterfaceRequest<Type>
Type.filter(...) // QueryInterfaceRequest<Type>

// More convenience methods when Type also adopts RowConvertible:
try Type.fetchAll(db)
try Type.filter(...).fetchCursor(db)

Persistable can create, update, delete:

protocol Persistable {
    func encode(to container: inout PersistenceContainer)
}

let record: Type = ...

// Convenience methods
try record.insert(db)
try record.update(db)
try record.delete(db)

It is important to understand that most of convenience methods can be built from public GRDB APIs. More on that later.

The catch is ArcKit is offering a quite different contract. ArcKit's persistence layer provides unique, mutable, lazy loading, thread safe instances.

RowConvertible may not be your friend, then, because it provides an initializer. It thus can't build reused instances. It can still build new values that wrap reused instances of some other type, though. This may be an option.

But you have to understand that RowConvertible is not the end of the story.

For example, let's take a type that only adopts TableMapping:

class ManagedPlayer: TableMapping {
    class let databaseTableName= "players"
}

From ManagedPlayer, you can build query interface requests, if you like those. Those requests won't be able to fetch anything (because ManagedPlayer does not adopt RowConvertible), but they can still be expressed:

// QueryInterfaceRequest<ManagedPlayer>
let request = ManagedPlayer.all()

// Compiler error
try request.fetchAll(db)

If you want to load/update managed players from this request, you can first turn the request into a request of raw rows:

// AnyTypedRequest<Row>
let rowRequest = request.asRequest(of: Row.self)

Then turn this request into a cursor:

dbQueue.inDatabase { db in
    // RowCursor
    let rowCursor = rowRequest.fetchCursor(db)

Turn this cursor of rows into a cursor of players managed by a store:

    // MapCursor<RowCursor, ManagedPlayer>
    let playerCursor = rowCursor.map { row in
        // Gross pseudo code
        let player = store.findPlayer(id: row["id"])
        player.update(from: row)
        return player
    }

And finally turn this cursor into an array:

    // [ManagedPlayer]
    let players = try Array(playerCursor)
}

This full pattern can be wrapped in a custom protocol that would let you write your own convenience methods:

extension ManagedPlayer: ManagedRecord { ... }
dbQueue.inDatabase { db in
    // Convenience method added on ManagedRecord
    let players = ManagedPlayer.fetchAll(db)

    // Convenience method added on TypedRequest where RowDecoder: ManagedRecord
    let player = ManagedPlayer.filter(name == "Arthur").fetchOne(db)
}

This is already what GRDB does. Add tons of convenience methods. For example:

You also have seen in the example above that the built-int RecordCursor, RowCursor, etc. are base cursors on top of which you can build derived cursors. GRDB cursors are a lot like lazy sequences of the Swift standard library. You can even build your own cursor that wraps, for example, a RowCursor:

class ManagedCursor<T>: Cursor {
    let store: Store
    let rowCursor: RowCursor

    func next() throws -> T? {
        guard let row = try rowCursor.next() else {
            return nil
        }
        let record = store.findRecord(id: row["id"])
        record.update(from: row)
        return record
    }
}

dbQueue.inDatabase { db in
    let request = ManagedPlayer.all()
    let rowRequest = request.asRequest(of: Row.self)
    let rowCursor = try rowRequest.fetchCursor(db)
    let playerCursor = ManagedCursor<ManagedPlayer>(store: store, rowCursor: rowCursor)
    let players = try Array(playerCursor)
}

This should prevent you from missing a non-final RecordCursor.

The amount of convenience you want to add to your GRDB extra leyer is up to you. Remember that the fundamental database access object is the cursor of raw rows. You can't be closer to sqlite3_step(). From it you can go very, very, far.

But I'm guessing I might also run into similar hurdles with TableMapping and Persistable?

As far as I known, no. TableMapping is just a bunch of static properties, and does not hit the database. Persistable has plenty of customization options.

Change tracking is a big one I'd like to gain from GRDB

Changes tracking is provided by the Record and RecordBox classes. Both requires the full GRDB record protocols. If it appears that RowConvertible is not your friend, then you won't be able to use changes tracking provided by Record.

But what is changes tracking? It's just comparison between a reference row, and the values outputed in encode(to:) method. It's not that hard to copy. OK, you can't copy it right away, because the PersistenceContainer type has no public initializer (and it will remain so). But you just have to provide your own container for changes tracking, and have a default implementation of encode(to:) that feeds from this container of your own.

For the thread safety of the mutable instances once they exist, that's my responsibility to provide, given that I've been foolish enough to commit to it ;)

Excuse the mandatory warning: YES it is difficult. I hope that you have good reasons for not using existing robust managed solutions like Core Data and Realm :-)

Anyway. Your thread safety will certainly be made difficult if you use database pools:

Database pools have multiple reader threads, which run concurrently with writes, and thus don't have the same view of the database. Two concurrent reads may fetch different values. Learn more about the WAL mode and snapshot isolation in SQLite for more information. GRDB database pools unleash the full power of SQLite, and rely heavily on unmanaged records to avoid scheduling headaches. So beware :-)

groue commented 6 years ago

I missed your previous comment with sample code. I'll check that.

sobri909 commented 6 years ago

Fantastic! You've given me a lot to go on there. Thanks so much :)

I was seeing RowConvertible as the first step, and assuming everything else would depend on that. So because I saw a blocker there I hadn't looked deeper. But it sounds like I can just skip over that, and potentially consume a bunch of other protocols safely.

I also didn't see that it was possible to make my own cursor class. That looks like a silver bullet for the item creation problem I was seeing. I'll sift my way through the docs/code on how to do that, and that should get me over the first hurdle.

Excuse the mandatory warning: YES it is difficult. I hope that you have good reasons for not using existing robust managed solutions like Core Data and Realm :-)

Heh. I'm actually migrating the bulk of this code from Core Data (ArcKit is basically chunks of Arc App being gradually open sourced, and Arc App's db layer is/was Core Data).

I want to keep ArcKit's dependencies to an absolute minimum, so Core Data is overkill. And it doesn't provide the mutable thread safety I want anyway.

I built my first ORM almost 20 years ago, and have built a more few since then. So unfortunately I have the scars to show that I know what sort of nightmare I've got myself into ;) Though mostly I can get away with unleashing this nightmare because ArcKit's object graph is fairly simple, and the mutating is almost always going to be done internally to ArcKit, so I can internalise most of the risk.

Anyway. Your thread safety will certainly be made difficult if you use database pools:

Interesting! I'm using pools at the moment, but I'm also doing all writes through writeInTransaction. Would the pool still be a risk in that case? I'll have a read through the SQLite docs you linked anyway.

Thanks again :)

groue commented 6 years ago

Here's my current PersistentTimelineStore class.

I better understand. And especially how you avoid inconsistent reads from database pool by ignoring the database content if an item is already in the cache.

I don't see when the cache gets invalidated, so I guess it happens somewhere else.

And since both the cache and the database have information about a record, the risk of conflict and data race exists. Again, I guess you know how to handle that, since you have plenty of ORM experience :-)

I also didn't see that it was possible to make my own cursor class.

I know that all the concrete cursor types that ship with GRDB (RowCursor, RecordCursor, DatabaseValueCursor, etc.) may have one want to build its own. But those types merely exists for microbenchmarks, and gaining a few CPU cycles. Previous GRDB versions had a single concrete DatabaseCursor type (close to the current type-erased AnyCursor). The fundamental is the Cursor protocol, all its built-in cursor derivation methods (filter, map, etc.), which are public and here to stay. You can define your own cursor type if you want to expose a nice cursor that hides GRDB guts, as in my ManagedCursor sample code above.

Would the pool still be a risk in that case? [...] but I'm also doing all writes through writeInTransaction

In the example below, you see how two threads, 1 and 3, can see the database at differents point in time. At the end, they see two different states at the same time:

thread 1   | thread 2             | thread 3
--------   | --------             | --------
read {     |                      |
  state A  |                      |
           |                      |
           | writeInTransaction { |
           |   write state B      | read {
           |   return .commit     |   state A
           | }                    | }
           |                      |
           |                      | read {
  state A  |                      |   state B
}          |                      | }

That's how SQLite WAL mode works.

So, yeah, I suggest that you have a very tight control on how the pool is used ;-)

groue commented 6 years ago

A database queue has much stronger guarantees (no two threads can access the database at the same time), so in effect, all database accesses are serialized just as blocks in a serial dispatch queue. It does not suffer from the WAL mode subtleties:

thread 1   | thread 2             | thread 3
--------   | --------             | --------
read {     |                      |
  state A  |                      |
}          |                      |
           | writeInTransaction { |
           |   write state B      |
           |   return .commit     |
           | }                    |
           |                      | read {
           |                      |   state B
           |                      | }
           |                      |
           |                      | read {
           |                      |   state B
           |                      | }

groue commented 6 years ago

Side note: congrats on using afterNextTransactionCommit! I'm happy that this convenient tool finds its users 🤓

sobri909 commented 6 years ago

I don't see when the cache gets invalidated, so I guess it happens somewhere else.

It happens somewhere in the future ;) I haven't written that code yet.

Aside: There seems to be a Swift / NSCache bug where you can't create a valid NSCacheDelegate in Swift 4. So I've deferred the cache purge handling until I have time to find a workaround for that. But basically when the NSCache tells me that an instance is about to be purged, I'll mark the instance as invalid, which will make it immutable (or some such) for anyone still holding onto it. Perhaps something vaguely similar to how Core Data disconnects instances from their context.

That's the fundamentals of SQLite WAL mode.

Interesting! I hadn't read that far at all, so I was in the dark on those WAL details.

I'm probably going to ignore that risk though ;) Given that ArcKit's persistent objects are only infrequently mutated, and mostly mutated internally, it's probably still a safe enough risk to take when weighed off against the read performance benefits.

I'm also bundling up the writes into bulk transactions, for energy consumption reasons. (ArcKit / Arc App stay alive 24 hours, so every tiny thing that consumes even the slightest bit of battery is equal to another email in my inbox saying "battery life is terrible!"). Deferring and grouping writes gives me precious energy gains when measured over the hours.

The downside to that is that database queries aren't aware of the new state. But again, this isn't a big issue, both due to the limited mutability of the classes, and because most of it happens internally to ArcKit, so I can know when to do a save(immediate: true) as necessary.

I'm seeing a lot of possibilities now, after reading your insights. Thanks again! I'm going to have some fun tomorrow, putting these pieces together :)

groue commented 6 years ago

That's pretty cool, @sobri909. I hope ArcKit takes the best of GRDB. Mind if we close this issue until you come back with good news, or a new question?

sobri909 commented 6 years ago

Yep! All good. You've given me plenty to go on :) I should be able to take it from here.

groue commented 6 years ago

Thinking of your attempt at managing records, I have added database snapshots, shipped in v2.9.0.

sobri909 commented 6 years ago

Interesting! Yet another thread safety tool to make use of. I'm already thinking up ways to use and abuse it.

Unfortunately I've been sidetracked by another project for the next month, so can't dive in and play just yet. But looking forward to getting back to ArcKit / GRDB soon, and finding new ways to make you yell "don't do that!"

groue commented 6 years ago

Unfortunately

No, we're both happy developers :-)

groue / GRDB.swift

Using RowConvertible et al with a commitment to unique objects #291