Support for pull query - Githubissues

metasoarous commented 5 years ago

I see that support for pull queries is on the README query feature map, but I didn't see an issue for it, so I thought I'd start one. Of all the pending query features, this one to me feels like the biggest hole in my day to day usage of Datomic & DataScript.

There's potentially a bit of a discussion here about interface and scoping. There seems to be a clear target for

individual pull queries
pull queries on a set collection of ids (pull-many basically)
pull queries within a datalog query like [:find (pull ?e [...]) :where ...]

In all of these cases, it would seam reasonable to send diffs corresponding to the relevant [e a v] triples. The thing this misses vs convention pull is obviously "where in the nested structure is this relevant". It would probably be fine to ignore this, but it is interesting to consider that with some kind of Reagent like api, you could return reactions which resolve to maps, which themselves might point to nested reactions.

The problem I see is what if you have a query like [:find (pull ?e [...]) (pull ?d [...]) :where ...], which is effectively a relationship between pull structures. This is legal in either Datomic or DataScript, but I'm not sure how you would interpret it here, because here you don't just have a collection of facts, you have a relation between collections of facts. So maybe this just isn't supported. Or maybe you can come up with some clever indexing scheme that pairs the pull diffs with a concept of where they are in the outer relation. What's interesting is that if we again consider the Reagent model, this again fits quite nicely into the idea of returning a reaction of nested reactions.

Again, thanks for the great work!

comnik commented 5 years ago

There are no decisions made regarding pull, but I have a few thoughts and I'm very happy for input on this as my mind is currently occupied with Datalog-related stuff.

Scope At work, pull suffices for many of the inter-service data dependencies we have. Our version of pull differs from that in Datomic in that we optimize for pulling across all entities, and we offer the ability to include simple constraints. We use this style of pull query at every layer of the stack, down to the individual components in a web-frontend. In a large scale setting it probably makes sense to enforce at least one constraint on root entities and to forbid the use of attribute wildcards. Wildcards for interactive documentation purposes would be handled separately (we can talk more about this).

Implementation What seems most interesting from an implementation-perspective is that relations constrain only the codomain, whereas a Datalog clause like [?parent :parent/child ?child] would constrain both. Given a pull expression

[:parent/name 
 {:parent/child [:child/name]}]

we'd need to create separate output streams for each path, each with different constraints applied to them. E.g.:

'(?parent) <- [:parent/name] | _ ;; no constraints
'(?parent :parent/child ?child) <- [:child/name] | [?parent :parent/child ?child]

The joins involved in the nested path would then produce [?parent ?child a v] tuples, whereas the top-level path produces [?parent a v] tuples. Such outputs could then be merged into nested maps on the client.

I did not quite follow your comment that

it is interesting to consider that with some kind of Reagent like api, you could return reactions which resolve to maps, which themselves might point to nested reactions. Could you explain?

Similarly for your last example. If the root constraints don't share symbols, I don't see how there would be a constraining relation between the two pull queries? Implemented as separate dataflows, Differential would still update them consistently. One could also join the outputs back together afterwards, we do something similar for multiple aggregations within a :find.

But once these things are cleared up it shouldn't be that much effort to come up with a proof-of-concept (I hope?).

comnik commented 5 years ago

A first step is done in https://github.com/comnik/declarative-dataflow/commit/4a2c8dff1a6e793d9bfacc93e02bc0502945dd19. That commit introduces a PullLevel operator which can be used to implement each of the individual streams mentioned above.

sixthnormal / clj-3df

Support for pull query #30