halgari / odin

An embedded extensible logic DSL for Clojure.
183 stars 11 forks source link

How do queries work? #7

Closed idmitrievsky closed 7 years ago

idmitrievsky commented 7 years ago

Hello,

I have a question about the accounts example.

(def data {:fred {:credits 1000 :debits 500}
           :sam {:credits 220 :debits 300}
           :sue {:credits 3300 :debits 100}
           :jane {:credits 2000 :debits 1000}})

(into {}
  (o/for-query
    (o/and
      (d/query data ?account :credits ?credits)
      (d/query data ?account :debits ?debits)
      (d/query data _ ?name ?account))
    [?account (- ?credits ?debits)]))

#=> {[:jane] 1000,
     [:sue] 3200,
     [:fred] 500,
     [:sam] -80}

So the output is not the same as in the README, but that is not the question.

In general I thought that for the query

(d/query data ?path ?key ?val)

the following holds for any data:

(-> data
    (get-in ?path)
    ?key
    (= ?val))

#=> true

for all tuples of ?path, ?key, ?val returned by the query.

And it seems to be the case for non-empty paths:

(into {}
  (o/for-query
    (d/query data ?account :credits ?credits)
    [?account ?credits]))

#=> {[:jane] 2000,
     [:sue] 3300,
     [:fred] 1000,
     [:sam] 220}

but not for the empty ones:

(into []
  (o/for-query
    (d/query data [] ?key ?val)
    [?key ?val]))

#=> [[:fred [:fred]]
     [:jane [:jane]]
     [:sam [:sam]]
     [:sue [:sue]]]

I was expecting to get this because with an empty path we operate on the outer data map:

#=> [[:fred {:credits 1000 :debits 500}]
     [:jane ...]
     [:sam ...]
     [:sue ...]]

Can you explain the intuition behind this behaviour, please?

halgari commented 7 years ago

Thanks for the question. The behavior change is not dependent on the path, but on the structure of the data being queried. When Odin indexes data it finds all collections and flattens them into tuples. So the data: {:foo {:bar 42}} gets indexed as:

[[] :foo [:foo]]
[[:foo] :bar 42]

The vectors [] and [:foo] have metadata on them of :path true this signifies that these are paths. It is a bit counter intuitive, so I should figure out a way to clarify it in the docs/code. But collections are referenced by path. This allows for the engine to keep these different paths through the data separate. Imagine the following data:

{:bill {:total 400}
 :jane {:total 400}}

If we referred to the values as {:total 400} we would have no way, based purely on the value, to tell if we were talking about Bill or Jane's account.

idmitrievsky commented 7 years ago

Thanks for the example!

I feel like I need some time to check out the paper you reference in the README because it still doesn't click for me why

{:bill {:total 400}
 :jane {:total 400}}

is indexed by Odin as (if I understand correctly)

[[] :bill [:bill]]
[[:bill] :total 400]
[[] :jane [:jane]]
[[:jane] :total 400]

and not

[[] :bill {:total 400}]
[[:bill] :total 400]
[[] :jane {:total 400}]
[[:jane] :total 400]
halgari commented 7 years ago

So this approach would work fine if we only wanted to move "down" through the tree of data. However if we move down, then back up, we would loose some context. This is to say, {:total 400} provides no information as to where this data exists in the tree of our input data. Worse yet, two accounts have this exact same data, so the engine would probably think they are the same.

However, if we refer to the account via a get-in path, we have [:fred] that not only tells us how we got here, but also how to move "up" by simply popping off one element in the vector.

idmitrievsky commented 7 years ago

Thank you for the explanation!