simolus3 / drift

Drift is an easy to use, reactive, typesafe persistence library for Dart & Flutter.
https://drift.simonbinder.eu/
MIT License
2.45k stars 354 forks source link

Recommended approach for fetching/storing multiple different relationships #569

Open fgblomqvist opened 4 years ago

fgblomqvist commented 4 years ago

To my understanding, it seems like the recommended approach so far when working with objects that can have relationships is to create extra data classes that bundle them together, similar to the example here with the shopping carts.

However, in a more realistic scenario, an object might have several different relationships. For example:

Car {
  Owner
  Wheels
  Passengers
  Garage
}

Some functions (especially render-ones) might want all, or some, or none of those children. I don't think it makes much sense to fetch them all separately in case you're dealing with many objects (e.g. 100 cars in the example above).

How should one go about structuring this stuff?

Should you create several different data classes? E.g. CarWithOwner, CarWithGarageAndPassengers, CarWithOwnerAndWheelsAndPassengersAndGarage etc. That seems to get messy and unmaintainable fast since you also need completely different code-paths for the fetching due to the different return types.

Should you create one extra data class (per main object) that has the ability to hold all the children? E.g. CarExt that extends Car with the fields owner, wheels, passengers, garage that are all nullable and can be loaded when needed (e.g. pass flags to the DAO). Naturally, you could run into issues with forgetting to toggle the right flags and therefore e.g. owner is null when you need it, but those can easily be caught when testing in development.

The second approach seems saner to me but isn't mentioned anywhere. It's still not ideal since it requires me to declare an extra class for every data class, just for those extra fields. At that point I might as well rename Car to CarDB or something and only use the bare types when directly interacting with moor. Now, if #114 gets implemented the mixin way we have more control over the data class and can therefore add extra fields if we want to without needing an extra data class. However, I still wanted to open a separate issue for this since I haven't seen it properly discussed anywhere (only seen the CarWithPassengers approach suggested).

What are peoples' thoughts about this? Has anyone used moor in a bigger codebase and solved this problem in some clever way? I don't like approaches that manually modify the generated code since you then have to commit it and remember to fix it after each generation. I would love to get some official insight on this and perhaps hear what the long-term plan/solution is. To me, it feels like moor should just add these fields for us since it must be quite common to want to load certain relationships right away (e.g. with joins).

simolus3 commented 4 years ago

It's not immediately helpful, but I can outline my plans for this in the long-term from the perspective as a library author.

The problem you mentioned is that, especially with many relationships, you never know which to load. For n possible relationships we can't possibly generate 2ⁿ classes and automatic mapping methods. We could have fields like Future<Owner> owner (or stream?) on the generated model, so that we'd lazily load the entity the first time the future is awaited. I'm not sure what the usability impact of that would be, but I like how it exposes that another async query is necessary.

I've been a bit reluctant to add ORM-like features to moor because it feels like I'd open a pandora's box of feature creep into the project. Moor and it's generator are complex enough on their own, and to me it feels like a proper ORM with the features you mentioned would be even more complex. Moor is designed to be a convenient, but thin wrapper around sql. I don't see how features like these could realistically fit into moor without negatively impacting users who prefer the relational style of it.

I'm considering writing an ORM around moor as a mostly-independent project. Initially it would allow something like #114 with a Room-like style of generating tables from models instead of the other way around as in moor (see what I did there? :laughing:). I'd say that this suggestion would fit quite nicely into that:

@Entity
abstract class Car { // implementation of this would be generated
  final String model;
  @PrimaryKey
  final String vin;

  @mapToOne
  Future<Person> get owner;

  @mapToMany
  Stream<List<Wheel>> get wheels;

  // ...
}

Since it would be built on top of moor (rather than being a part of it), I will be able to experiment with those features more freely and I don't have to deal with the added complexity of integrating these features into moor. I've thought about this for some time now, I want to get started when I have some time.

North101 commented 4 years ago

What about using https://pub.dev/packages/tuple for the time being?

fgblomqvist commented 4 years ago

@North101 It simplifies the naming, but you'd still end up with tons of different types (even if they are all just a tuple at the end of the day). I think ideally I'd want one single method for fetching things, e.g. getCars(loadWheels, loadPassengers, ...) that would conditionally load the desired data.

@simolus3 Thanks for the lengthy response. I agree with you that ORM features would be a pandora's box and I don't think adding them is a good idea. Whether I can call car.wheels or have to call WheelsService.getByCar(car) doesn't really matter since I still have to deal with the async headache. I think a framework like Flutter (or most UI frameworks for that matter) requires/encourages you to plan your data loading better and therefore convenient lazy-loading methods become less useful/needed.

Building an ORM on top of moor sounds like a fine plan (and yes, great pun 😆). Though, with that being said, it looks like floor is already building an ORM that looks a lot like what you're describing, and it sounded like he would utilize a part of moor already (the SQL parser to begin with). It also looks like you guys could benefit a lot from each-others help (due to the size of these projects), but I'll leave it at that.

I think, ultimately, if you feel like #114 is too much for this project, then the way forward would be to treat moor as the thin wrapper it is and only use the data classes it generates to interact with moor. Probably name them something different as I suggested above (this could obviously be done by the user) and then inherit them and define more fields/methods as one likes. This could be done on an as-needed basis (e.g. not needed for simple small projects, or not for simple tables).

Alternatively, if you think #114 would still be useful here (as a flexible bridge between the wrapper -> orm world), one could treat the data classes generated by moor as a flexible foundation that the user could do whatever they wish with (e.g. build orm-like functionality, or something more bare/simple). One could certainly argue that it might look a tad awkward to have to define both a table AND an empty data class (that mixes in the mixin) to get started, but tbh I don't think it's a real problem. Sure, in the ORM world one would not expect it, but yeah, this ain't an ORM.

Personally, I'm starting to lean towards the first approach. Regardless, I think it would make sense to clarify in the README or somewhere else what the purpose/goal is with moor and what it isn't. I think what you wrote above is great and will help people who find their way to this issue.

Mike278 commented 4 years ago

At the moment we’re trying two different approaches.

The first approach is to simply not do any joins and resolve relationships as late as possible - often with switchMaps close to the UI, or components that lazily fetch data as they appear on screen. The n+1 problem isn’t as scary with sqlite, but we mainly use this approach on form-like screens as opposed to long lists.

The second approach is to do very specific queries to select the exact columns we need for the screen/use case. We mainly take this approach when combining query results in dart would be more complicated/expensive than just writing the join in sql - often for long detailed lists.

There’s some other interesting discussion here. I’ve been meaning to experiment with using views but haven’t had a chance.

fgblomqvist commented 4 years ago

@Mike278 For the second approach, what type does the DAO return? A custom one for that specific query or a custom one that is also used generally elsewhere?

Mike278 commented 4 years ago

Yep a custom one for that specific query.

micimize commented 3 years ago

I'm trying to evaluate the viability of aggressively using the lazy getter approach in my app, and am wondering if anyone knows/has experience with how expensive it really is?

Like, if all of my getters are .watch() streams, that's not just n+1 queries, it's n+1 watchers, streams, StreamBuilders, etc. How expensive are those abstractions, and... should I care? Does it even matter?

/// Lazy getters approach.
/// 
/// Simple, defers evaluation until render, but results in:
/// * n+1 queries and db watchers (or no db watchers with futures) 
/// * the need for lots of StreamBuilders or FutureBuilders
extension TaskMethods on Task {
  Stream<List<CalendarEvent>> get calendarEvents =>
      db.calendarEvents.query((ce) => ce.whereTaskIs(this)).watch();
}

/// Upfront complex query approach.
/// 
/// Only 1 query + stream, but:
/// * more complex query code (not a big deal)
/// * either awkward or redundant domain models (kind of annoying)
/// * _or_ TupleN results (also annoying/awkward)
extension AppDatabaseMethods on AppDatabase {
  Stream<List<Tuple2<Task, List<CalendarEvent>>>> tasksWithEvents() => tasks
      .query()
      .join([leftOuterJoin(calendarEvents, calendarEvents.onTaskRef(tasks))])
      .watch()
      .map(
        arrayAgg(
          (row) => row.readTable(tasks),
          (row) => row.readTableOrNull(calendarEvents),
        ),
      )
      .map(tuplify2);
}
arrayAgg and tuplify2 for those interested ```dart /// Aggregate the results of [readValue] into a `List` under the unique keys from [readKey]. /// /// Uses [LinkedHashMap] so order is maintained. LinkedHashMap> Function(List) arrayAgg( K Function(TypedResult) readKey, V? Function(TypedResult) readValue, ) => (rows) { final agg = LinkedHashMap>(); for (final row in rows) { final key = readKey(row); agg[key] ??= []; final value = readValue(row); if (value != null) { agg[key]!.add(value); } } return agg; }; /// Convert aggregates' entries into tuples of (key, value) List> tuplify2(Map aggregates) => aggregates.entries.map((e) => Tuple2(e.key, e.value)).toList(); ```
It is also possible to use a generic record type like Tuple* to compose convenience getters, offsetting the pain of rewriting all your models for each view, but... it is still worse DX than just lazily streaming methods from already-generated models IMO ```dart extension TaskWith on Tuple2 { Task get task => first; String get title => first.title; String get description => task.description; } extension TaskWithEventsGetters on Tuple2> { List get calendarEvents => second; } /// The below uses both extensions String _firstStart(Tuple2> t) => t.title + ' ' + t.calendarEvents.first.start.toString(); ```
simolus3 commented 3 years ago

That's an interesting question! In general, I think the cost of a sqlite query tends to be overestimated because most of us are coming from database servers where each query requires a network roundtrip. That's not an issue with sqlite, where simple queries are quite cheap. We could consider implementing a cache for prepared statements in moor to improve performance for queries with the same sql code but different variables. I didn't do any benchmarking to see if that's actually a bottleneck though.

But intuitively, I'd recommend optimizing for DX and then open issues in moor if you run into specific scenarios that the library could optimize. Stream queries don't have a high overhead, you just need to be aware that moor always invalidates streams on a per-table basis. So if you have say hundreds of independent streams on a table that all have to update for a tiny change affecting one row, that might cause a performance hit. In those cases, a single stream with joins might be a better solution.