lykahb / groundhog

This library maps datatypes to a relational model, in a way similar to what ORM libraries do in OOP. See the tutorial https://www.schoolofhaskell.com/user/lykahb/groundhog for introduction
http://hackage.haskell.org/package/groundhog
176 stars 39 forks source link

Streaming interface for queries? #41

Open hvr opened 9 years ago

hvr commented 9 years ago

Currently, I use select as in e.g.

            coms <- runDbConn (select ((ComBoardField ==. BoardIdKey bid) &&.
                                       (ComArtNumField >=. lo') &&.
                                       (ComArtNumField <=. hi'))) cm

to fetch up to about 20000 rows from a table. However, this seems to result in 60MiB of ARR_WORDS being allocated according to Heap profiling, which is far more than the underlying (Sqlite) database contains actual data.

Is there a way to consume result values as soon as they get returned, rather than all at once? Maybe e.g. via a fold-like API?

lykahb commented 9 years ago

The size of data loaded in memory may be 2-4 times bigger than in the database because of boxed representation in Haskell. Adding a streaming interface involves two issues:

  1. Where to put the streaming functions separating them from functions that return lists. This can be done by adding selectStream, projectStream to PersistBackend. Alternatively, they may be in another class or module. I am open to the name suggestions.
  2. Choosing the interface. There are two popular streaming libraries: Pipes and Conduit and they both have solid infrastructure. Currently Groundhog has a type RowPopper that existed in Persistent prior to conduit interface. The function selectStream can use either it or Source/Producer from Conduit/Pipes respectively. In the first case we can have libraries like groundhog-pipes that transform RowPopper to the actual type. In the second case Groundhog will be tied into one of the libraries.
hvr commented 9 years ago

The size of data loaded in memory may be 2-4 times bigger than in the database because of boxed representation in Haskell.

Yeah, but I'm surprised that the majority of heap objects are ARR_WORDS rather than (boxed) constructors...

I've got no strong opinion about the API. At the end of the day I just need a fold-style (and/or foldM-style) API, as I work mostly with Builders, and sometimes I want to interface with simple low-overhead ByteString.Builder -> IO ()-style sinks, for which I deliberately avoid the use of any high-level streaming framework. So I guess if I can "dumb down" a pipes tuple-source to a trivial fold-style API w/o paying too much overhead, I'll be happy :-)

hansonkd commented 9 years ago

@lykahb

I'm currently writing a RethinkDB backend and am particularly interested in this issue. Having a Streaming instance opens up change subscriptions in Rethink and tailable cursors in MongoDB (Although this would probably be another class StreamingSubscribe or something).

I believe the Cursor only needs to be implemented in terms of two functions close (this is debatable since cursors should be automatically close, but could be handy for the subscriptions). and next which returns (Just Entity). From that we can build mapping/folding implementations. However this requires a cursor object which sql-simple doesn't expose.

We could implement it in terms of a fold (which I think all sql-simple packages already have implemented). PersistEntity v -> m () would be in the spirit of the library but it could also be a typefamily associated with Cursor a, CursorRow a, and the fold could be CursorRow a -> m () and make CursorRow a able to convert to Entity v, which would open up two functions foldRows and foldEntityRows.

I don't think groundhog needs to be tied to any other iterator libraries. Although a clean implementation would leave open the possibility of a lightweight groundhog-conduit down the road.

EDIT: Actually a conduit library would be a pretty cool combination with websockets or other network that would let you publish db changes or events easily. With a StreamingSubscribe it could resemble something like meteor.

lykahb commented 9 years ago

I have just committed a streaming interface. It was written a while ago and was going to push it after writing at least either groundhog-pipes or groundhog-conduit. It has a bracket-like function interface that passes next and closes automatically. Please let me know what you think about ca2e901ae38774040721a0b85f974cab9e301130

hvr commented 9 years ago

@lykahb What's the ETA for groundhog-pipes btw?

hansonkd commented 9 years ago

This looks like it should work! Thanks.

When I finish up my RethinkDB backend I'll take a look at implement groundhog-conduit

lykahb commented 9 years ago

Sorry, @hvr. I've just noticed your comment. I got back to groundhog-pipes yesterday. It is harder than I thought to use bracket-like function in there. Perhaps, as @hansonkd suggested, I will use next and close.

lykahb commented 9 years ago

I've committed groundhog-pipes. After logging is fixed, the packages will be ready for release!