groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.84k stars 705 forks source link

Best Practices to implement DAO/Repository pattern within GRDB? #240

Closed cody1024d closed 7 years ago

cody1024d commented 7 years ago

Hey @groue! I'm hoping we can have a discussion on the best-practices if I wanted to go with a DAO/Repository pattern surrounding my data-access layer. Specifically, I'd like to only expose protocols, similar to these:

protocol TrackedTaskFilterRepository {
    func getAll() -> [TrackedTaskFilter]
    func getAllFor(user: User) -> [TrackedTaskFilter]
}

All interactions to fetch TrackedTaskFilters (in this case) would go through that repository/protocol. I find this the best practice in my opinion, as far as encapsulating data access behind a singular protocol (per domain concept).

Within GRDB, I think I have two options, atleast based on my understanding of the library.

  1. Use the raw SQL language in the implementations of the above methods. This would keep everything hidden behind the protocol as I liked. The con to this, is that I'd be writing raw SQL to perform the operations I wanted (and would not be getting the ORM-esque benefits of the library).

  2. Make TrackedTaskFilter implement the record protocols provided by GRDB. This would allow me to interact with the query DSL that you've created (which is very clean btw), which is a huge upside to me. The downside of this, though, is that it would expose those query methods outside of the protocol, which I don't want.

Ok, having just thought about this some more; I think I found a solution that I hadn't thought about before. I think I would be able to, inside of the implementation of the repository, create fileprivate protocol extensions to the persistable Protocols, to solve my scoping concerns? Something like this perhaps:

class GRDBTrackedTaskFilterRepository: TrackedTaskFilterRepository {
    ...
}

fileprivate extension TrackedTaskFilter: Persistable {
    ...
}

(TrackedTaskFilter is a struct in my project; and thus I could not just extend Record)

In this case, I would still perform the queries through the static functions on TrackedTaskFilter, but my main concern, that those methods would be exposed outside the protocol-entrance would be solved, correct?

groue commented 7 years ago

Hey @groue! I'm hoping we can have a discussion on the best-practices if I wanted to go with a DAO/Repository pattern surrounding my data-access layer.

Sure, @cody1024d, let's go!

Within GRDB, I think I have two options, atleast based on my understanding of the library.

Use the raw SQL language in the implementations of the above methods. This would keep everything hidden behind the protocol as I liked. The con to this, is that I'd be writing raw SQL to perform the operations I wanted (and would not be getting the ORM-esque benefits of the library).

Yes.

Make TrackedTaskFilter implement the record protocols provided by GRDB. This would allow me to interact with the query DSL that you've created (which is very clean btw), which is a huge upside to me. The downside of this, though, is that it would expose those query methods outside of the protocol, which I don't want.

Yes.

Ok, having just thought about this some more; I think I found a solution that I hadn't thought about before. I think I would be able to, inside of the implementation of the repository, create fileprivate protocol extensions to the persistable Protocols, to solve my scoping concerns?

This sounded like a brilliant idea to me until I tried it and Swift compiler told me that "fileprivate modifier cannot be used with extensions that declare protocol conformances". This page gives an understandable rationale for this limitation.

Now back to your question. I haven't though about DAOs much, because I'm not quite familiar with them. I used to see them as a painfully verbose pattern which doubles each and every type. Since "Active Records" are handy and cover quite well so many use cases, they were the obvious API to support first in GRDB. Now I do agree with you: sometimes active records don't quite fit the bill. It happens every now and then that some database APIs would better be hidden, and this usually means when some data contracts can't be expressed as SQLite contraints.

So. How can GRDB help here?

First, some generalities:

First, the DAO pattern itself may not be able to profit from Swift: it's likely you'll need just as many types, protocols, concrete types, whatever, I mean the whole bestiary required by the pattern. I'm not sure, I haven't studied the subject in details. GRDB shouldn't prevent you from adopting it, though, and you may just miss some GRDB conveniences. Yet, since GRDB records are built on top of public GRDB low-level APIs, you'll be able to write your own convenience methods.

Next, the two "dangerous" record protocols are TableMapping and Persistable, the two protocols that come with methods that can modify the database. See the list of record methods.

RowConvertible is innocuous, since it can only read. But it is only able to consume SQL requests.

Now, some concrete advice:

Methods that access the database should have a Database parameter.

// GOOD PRACTICE
protocol TrackedTaskFilterRepository {
    func getAll(_ db: Database) throws -> [TrackedTaskFilter]
    func getAll(_ db: Database, forUser user: User) throws -> [TrackedTaskFilter]
}

The bad idea is the following:

// UNWISE
protocol TrackedTaskFilterRepository {
    func getAll() -> [TrackedTaskFilter]
    func getAllFor(user: User) -> [TrackedTaskFilter]
}

The reason is safe concurrency. I don't know if you have already read the GRDB concurrency guide. Please check it first. Check also How to build an iOS application with SQLite and GRDB.swift.

Oh, and don't hide errors: just let them bubble up. That's because some errors have to be handled at a very high application level.

Methods that perform several database updates in a row should group them in a savepoint.

// GOOD PRACTICE
struct ConcreteTrackedTaskFilterRepository : TrackedTaskFilterRepository {
    func doStuff(_ db: Database) throws {
        try db.inSavepoint {
            try db.execute(...)
            try db.execute(...)
            return .commit
        }
    }
}

The bad idea is as below:

// MISGUIDED
struct ConcreteTrackedTaskFilterRepository : TrackedTaskFilterRepository {
    func doStuff(_ db: Database) throws {
        try db.execute(...)
        try db.execute(...)
    }
}

This is because savepoints guarantee that all changes are performed, or none of them. Don't use a transaction, because your method would then become impossible to wrap in another transaction.

Finally, when you have a method that performs database updates and also interacts with other resources such as files or system sensors, you may be interested in the After Commit Hook.

There you go! I hope those advices will help you when you eventually give GRDB a try :-)

cody1024d commented 7 years ago

Firstly @groue, thanks for taking the time to write that up. It was incredibly informative, and definitely useful. A couple of responses:

I prefer the repository pattern mostly for unit testing purposes. Being able to de-couple the business rules from the DB, and implement a "Mock" repository (that I perhaps set the return values on) has been life-saving for focused unit testing. To a lesser extent, also, it provides the flexibility for future data-storage changes (in case I want to save the entities in some strange new different way in the future).

My protocol wasn't exactly showing the whole picture. The implementation of the TrackedTaskFilterRepository (which I call SQLiteTrackedTaskFilterRepository) takes a DB in it's constructor, which would allow me to inject the same DB instance across all DAO instances, to ensure concurrency safety (if this is in fact safe, and I haven't misunderstood something).

As for the throwing an error, I actually tend to disagree here, a bit, but I think it's definitely fertile grounds for a discussion! In my opinion, I'd much rather capture the error within the DAO/repository layer, log it, and return a pure domain-result. (in this case an empty one) Depending on the situation, I could even see returning a result, that contains some information letting hte consumer know an error occurred, without throwing the error up. This way, other implementations (let's say, in the future, I want an InMemoryTrackedTaskFilterRepository) can handle the errors their own way. In this way, I encapsulate "database-related" information (including errors) into my DAO layer.

I'm going to give this problem more thought, because I want all the PROs, and none of the CONs of using your library in the future :P Reading back through the readme, I think my best course of action is to implement the protocols as normal; and just live with the fact that there are methods on the entities that expose a request. At the very least, the "consumer" of the entities will not be able to do anything with the request, without a DB instance (which will only be injected into the Repository/DAO classes). I'm trying to convince myself that that's "Ok", because I really, really don't want to write SQL even inside of my repository classes hahah.

groue commented 7 years ago

My protocol wasn't exactly showing the whole picture. The implementation of the TrackedTaskFilterRepository (which I call SQLiteTrackedTaskFilterRepository) takes a DB in it's constructor, which would allow me to inject the same DB instance across all DAO instances, to ensure concurrency safety (if this is in fact safe, and I haven't misunderstood something).

You're "safe", but just only "safe" as in "no crash". You're still exposed to data corruption.

As tong as you prevent high-level objects from grouping fetching methods together in a database high-level method such as DatabaseQueue.inDatabase, you prevent them from fetching consistent data with the absolute guarantee that no concurrent thread can change the database between two related fetches. You may not understand why you need such guarantee today, because you've been brain-washed by libraries that pretend SQLite concurrency is a non-subject. I suggest you put more thought in it - if you really want to establish a reusable and safe pattern.

You may try to think of it in terms of composability. Those methods are not safely composable:

func getPlayers() -> [Player]
func getTeams() -> [Team]

For example, the following code may load inconsistent data, because the loaded teams may be different from the teams that existed in the database at the time the players were loaded:

let players = repo.getPlayers()
// <-- here other threads may change the database in unexpected ways
let teams = repo.getTeams()

For example, you may sometimes have players that do not belong to any team:

for player in players {
    if let team = teams.first(where: { $0.id = player.teamId }) {
        print("\(player.name) in \(team.name)")
    } else {
        // bug: all players should have a team
        print("meh the database has a bug")
    }
}

The database has no bug, but the application has.

To fix this lack of composability, you have first to spot the problem (which is not obvious because this kind of bug only happen sometimes, due to their multi-threading ground), and then introduce a third method getPlayersAndTeams that provides the necessary isolation. But you see how quickly this pattern leads to a combinatorial explosion of convenience methods. And the many opportunities for such bugs to remain unnoticed, or hard to reproduce, and fix.

Instead, you can simply make your data access methods composable right from the start:

let (players, teams) = try dbQueue.inDatabase { db in
    try (repo.getPlayers(db), repo.getTeams(db))
}

Now you're guaranteed with consistent data. And you don't have to think much more about it.

As for the throwing an error, I actually tend to disagree here, a bit, but I think it's definitely fertile grounds for a discussion! In my opinion, I'd much rather capture the error within the DAO/repository layer, log it, and return a pure domain-result.

Do as you wish.

The more I read you, the more I understand your app has a single "repository" object which is a little like a "god object". Usually developers call it "DatabaseManager". And pour into it all database-related methods. Those objects are very often poorly designed, don't understand error handling, pretend there is no operating system, ignore concurrency, and provide a locked API that quickly forbids application progress, and hide sporadic bugs behind rigid "patterns". This is usually the sign of an over-design of a misunderstood domain. Those "manager" objects don't actually manage anything, and actively prevent actual management. The APIs they expose are nice but unrealistic, and do not suit the reality of application development. I'm not saying you're about to write such code. But your goals bring you very very close to it.

You have been asking for best practices, and you plan to go against every one of them. I'm not sure this means that GRDB does not fit your needs. On the contrary, I think it means GRDB is there for you to improve your understanding of the relationship between an application and its database(s). When eventually you get why database queues and pools should not be hidden inside a "manager" object, you'll be happy that GRDB has all the tooling you really need.

groue commented 7 years ago

Now of course I certainly won't advise against your attempt at setting up your ideal database layer. And it's possible that I have sometimes sounded mysterious in my comments above. My rule of thumb on "patterns" is that they should pass the rule-of-three test: until they have proven relevant on at least three different kinds of apps, a pattern errs on the experimental side of development, and can't be considered solid enough. It's not yet, strictly speaking, a "pattern". The best practices I recommend above are proven, rock-solid, and yet grant the amount of flexibility that allows an app to evolve at its own pace without requiring sudden heavy refactoring. It's difficult to "prove" it, though ;-)

cody1024d commented 7 years ago

@groue, just because we disagree does not mean one person is right, or wrong. The repository pattern is extremely well defined and used in many standard architectures throughout many different languages. Clean Architecture, is mostly the one I'm thinking of.

I am definitely not using one global God object. You're right that is bad design, and leads to tight coupling across unrelated domain objects. Also extremely low cohesion.

You are definitely right though, that I hadn't considered that case with players and teams. I had not experienced that issue before, but it is apparent that that would be an issue.

Unfortunately I don't think this conversation is really helping anyone at this point, so I'll close it.

groue commented 7 years ago

I'm sorry @cody1024d if my way of writing has sounded unhelpful. I'm happy I could tell you about the latent bugs behind subsequent fetches, and I hope you will not only avoid experiencing this issue, but also prevent it absolutely. Surely a goal that is compatible with the best patterns of the literature.

groue commented 6 years ago

A new guide has shipped in GRDB 3: Good Practices for Designing Record Types.