tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.45k stars 304 forks source link

Querying nested data structures with datascript #217

Open lvh opened 7 years ago

lvh commented 7 years ago

I am turning deeply nested trees with datalog via something akin to intension. (I was previously using intension but I've replaced it with some specter, which got me a neat 30% performance boost.)

This means my queries contain stuff like:

[:a ?acct :gizmos _ :gizmo-id ?id]
[:a ?acct :gadgets _ :id ?gadget-id]
[:a ?acct :gadgets _ :color ?color]
[:a ?acct :gizmo-configuration ?cid :gizmo ?id]
[:a ?acct :gizmo-configuration ?cid :gadget ?gadget-id]

It was pretty surprising to see a (working) datascript db input with data terms in the query with more than three entries. When I read the datascript source a while ago, I seem to remember a lot of highly optimized code that expects EAVT/AVET/VEAT.

Is reducing a nested data structure to a bunch of facts the most efficient way to query this kind of data structure? I would prefer to use something declarative with reusable rules like datascript or datamaps over something like specter for most of these queries, although specter is fine for many of them too. As you can see, the queries get repetitive fast, so I need something to describe the behavior.

What are the downsides for not reducing it to a more traditional db of facts? (Presumably there's at least a performance consequence.) Is there a way I can make the fact set faster, even if e.g. it's just using a sorted set?

tonsky commented 7 years ago

DataScript is only faster if it queries its indexes. If you query a collection it doesn’t really matter how many positions are in your patterns. So if you put it all into DataScript DB you could probably win some speed on queries (by the expense of creating such indexes, of course, and you’ll have to change your queries)

lvh commented 7 years ago

Gotcha. But I did understand correctly that datascript can only use those indexes if the data is really in EAVT form, right?

(Unfortunately, I don't think I can actually get it in that form, because it's a pretty gnarly-shaped tree. Maybe one day.)

Thanks!

lvh commented 7 years ago

(Oh, and also: if I understood your comment correctly, datascript also can't accidentally do anything with a sorted set that's smarter than just having a vec, right?)

tonsky commented 7 years ago

Gotcha. But I did understand correctly that datascript can only use those indexes if the data is really in EAVT form, right?

Not only in EAVT form, but also stored in native DataScript indexes.

datascript also can't accidentally do anything with a sorted set that's smarter than just having a vec

No, not really.

lvh commented 7 years ago

Ah, I assumed something called (d/datoms db :eavt) or equivalent, hence needing EAVT-ish structure. How do I index a set of facts? (I am volunteering to write prose docs.)

tonsky commented 7 years ago

(datascript.core/db-with (datascript.core/empty-db) [<tx-data>])

lvh commented 7 years ago

Cool, thanks. I tried that {:db.fn/add *facts*}, but that pegged my CPU at 100% for a while and eventually threw an exception so big Emacs crawled to a halt and I had to kill -9 it. What does that look like when you have a seq of facts, and not tx-data?

Does the pull API similarly only work with something in a db? Does it work even if it's not in EAVT form?