replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.63k stars 97 forks source link

parameterized query with lookup ref returns different result to inlined lookup ref #118

Open xificurC opened 4 years ago

xificurC commented 4 years ago

Might be related to #75 . An invalid inline lookup ref throws an exception, a parameterized one doesn't.

bus.scheduler> (def c (db/connect (doto "datahike:mem://test" db/create-database)))
#'bus.scheduler/c
bus.scheduler> (do (db/transact c [{:db/ident :id :db/valueType :db.type/keyword :db/cardinality :db.cardinality/one :db/unique :db.unique/identity}]) nil)
nil
bus.scheduler> (do (db/transact c [{:id :a} {:id :b}]) nil)
nil
bus.scheduler> (db/q '[:find ?id :where [[:id :c] :id ?id]] @c)
clojure.lang.ExceptionInfo: Nothing found for entity id [:id :c]
bus.scheduler> (db/q '[:find ?id :in $ ?idsel :where [?idsel :id ?id]] @c [:id :c])
#{}
xificurC commented 4 years ago

This seems to be the same in datascript, @tonsky

% clj -Sdeps '{:deps {datascript {:mvn/version "0.18.8"}}}'
Downloading: datascript/datascript/0.18.8/datascript-0.18.8.pom from https://repo.clojars.org/
Downloading: datascript/datascript/0.18.8/datascript-0.18.8.jar from https://repo.clojars.org/
Clojure 1.9.0
user=> (require '[datascript.core :as db])
nil
user=> (def s {:id {:db/unique :db.unique/identity}})
#'user/s
user=> (def c (db/create-conn s))
#'user/c
user=> (db/transact c [{:id :a}])
#object[datascript.core$transact$reify__3612 0x7f2c995b {:status :ready, :val #datascript.db.TxReport{:db-before #datascript/DB {:schema {:id #:db{:unique :db.unique/identity}}, :datoms []}, :db-after #datascript/DB {:schema {:id #:db{:unique :db.unique/identity}}, :datoms [[1 :id :a 536870913]]}, :tx-data [#datascript/Datom [1 :id :a 536870913 true]], :tempids #:db{:current-tx 536870913}, :tx-meta nil}}]
user=> (db/q '[:find ?id :where [[:id :b] :id ?id]] @c)
ExceptionInfo Nothing found for entity id [:id :b]  clojure.core/ex-info (core.clj:4739)
user=> (db/q '[:find ?id :in $ ?idsel :where [?idsel :id ?id]] @c [:id :b])
#{}
xificurC commented 4 years ago

After 40 minutes of digging I didn't find a quick solution so I hope someone more at home will take a crack at this. I'll open an issue in datascript as well as this seems to be lifted from there.

jjtolton commented 4 years ago

@xificurC what is the expected behavior? should the both throw or should the both return the empty set?

xificurC commented 4 years ago

After thinking about it, I think of [[:id :b] :foo ?foo] as a shorthand for [?e :id :b] [?e :foo ?foo], which doesn't throw

xificurC commented 4 years ago

More discussion of the same issue in datascript

whilo commented 4 years ago

It is true that Datomic does not allow pull or transactor syntax inside of Datalog syntax, as @tonsky said.

https://docs.datomic.com/on-prem/query.html#query

But I think your idea can be handy to avoid introducing additional Datalog clauses for symbols you do not use. At least a few times I was annoyed by the boilerplate needed in this case as well. Since the equivalent Datomic Datalog expression can be arrived by reasoning on the syntactic level, we can support such extensions just by applying a function to the query DSL surface like this:

(defn with-lookup-ref? [clause]
  (vector? (first clause)))

(with-lookup-ref? '[[:id :b] :foo ?foo]) ;; => true

(defn expand-lookup-ref-clause [clause]
  (let [[[attr lvar] & r] clause
        isym (gensym "?lookup-ref")]
    [[isym attr lvar]
     (into [isym] r)]))

(expand-lookup-ref-clause '[[:id :b] :foo ?foo])
;; => [[?lookup-ref7674 :id :b] [?lookup-ref7674 :foo ?foo]]

(mapcat (fn [clause]
          (if (with-lookup-ref? clause)
            (expand-lookup-ref-clause clause)
            [clause]))
        '[[[:id :b] :foo ?foo] [?foo :bar "bar"]])
;; => ([?lookup-ref7684 :id :b] [?lookup-ref7684 :foo ?foo] [?foo :bar "bar"])

(defn lookup-ref-syntax-trafo [query]
  (let [not-where #(not= % :where)
        head (take-while not-where query)
        new-clauses (->> query
                       (drop-while not-where)
                       rest
                       (mapcat (fn [clause]
                                 (if (with-lookup-ref? clause)
                                   (expand-lookup-ref-clause clause)
                                   [clause]))))]
    (vec (concat head [:where] new-clauses))))

(lookup-ref-syntax-trafo '[:find ?id :where [[:id :b] :foo ?foo] [?foo :bar "bar"]])
;; => [:find ?id :where [?lookup-ref7690 :id :b] [?lookup-ref7690 :foo ?foo] [?foo :bar "bar"]]

Datomic Datalog does not yet define a syntax case that uses vectors in the first place of clauses and therefore this syntax transformation should not clash with existing queries. But there is no guarantee that will stay the same in the future.

The question is what syntactic extensions would make sense and how can we keep them compatible with each other? I am open to play around and we can collect the best ideas in a general purpose library that supports all Datomic Datalog variants and maybe crux if that is possible. But I would be reluctant to directly commit to syntactic extensions without some hammock time :).