replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.62k stars 95 forks source link

[Bug]: Inconsistent treatment of invalid constant values #650

Closed jonasseglare closed 7 months ago

jonasseglare commented 10 months ago

What version of Datahike are you using?

0.6.1551

What version of Java are you using?

openjdk version "20.0.1" 2023-04-1

What operating system are you using?

Ubuntu 22

What database EDN configuration are you using?

(let [schema {:name   {:db/unique :db.unique/identity}
                :friend {:db/valueType :db.type/ref}}
        db (d/db-with (db/empty-db schema)
                      [{:db/id 1 :id 1 :name "Ivan" :age 11 :friend 2}
                       {:db/id 2 :id 2 :name "Petr" :age 22 :friend 3}
                       {:db/id 3 :id 3 :name "Oleg" :age 33}])]
....

Describe the bug

I discovered an inconsistency in how queries are evaluated. In the file lookup_refs_test.cljc, there is the following test:

 (is (= (d/q '[:find ?e
                  :in $ [?e ...]
                  :where [?e :friend 3]]
                db [1 2 3 "A"])
           #{[2]}))

which passes.

Based on that test, I wrote this test, which also passes:

(is (= (d/q '[:find ?e
                  :in $ [?e ...]
                  :where [?e :friend 3]]
                db ["A"])
           #{}))

What is potentially problematic here is that ?e cannot be bound to the value "A" because it is not a valid entity id reference. The query engine chooses to handle this by ignoring the invalid binding and the result of the query becomes #{}. However, if I write a similar test,

(is (= (d/q '[:find ?e
                    :in $ ?e
                    :where [?e :friend 3]]
                  db "A")
             #{}))

I get this stack trace:

1. Unhandled clojure.lang.ExceptionInfo
   Expected number or lookup ref for entity id, got "A"
   {:error :entity-id/syntax, :entity-id "A"}
                utils.cljc:  123  datahike.db.utils$entid/invokeStatic
                utils.cljc:   97  datahike.db.utils$entid/invoke
                utils.cljc:  127  datahike.db.utils$entid_strict/invokeStatic
                utils.cljc:  126  datahike.db.utils$entid_strict/invoke
                query.cljc:  920  datahike.query$resolve_pattern_lookup_refs/invokeStatic
                query.cljc:  913  datahike.query$resolve_pattern_lookup_refs/invoke
                query.cljc: 1073  datahike.query$_resolve_clause_STAR_/invokeStatic
                query.cljc:  975  datahike.query$_resolve_clause_STAR_/invoke
                query.cljc: 1086  datahike.query$_resolve_clause$fn__36624/invoke
          query_stats.cljc:   31  datahike.query_stats$update_ctx_with_stats/invokeStatic
          query_stats.cljc:   19  datahike.query_stats$update_ctx_with_stats/invoke
                query.cljc: 1085  datahike.query$_resolve_clause/invokeStatic
                query.cljc: 1081  datahike.query$_resolve_clause/invoke
                query.cljc: 1083  datahike.query$_resolve_clause/invokeStatic
                query.cljc: 1081  datahike.query$_resolve_clause/invoke
                query.cljc: 1095  datahike.query$resolve_clause/invokeStatic
                query.cljc: 1088  datahike.query$resolve_clause/invoke
     PersistentVector.java:  343  clojure.lang.PersistentVector/reduce
                  core.clj: 6885  clojure.core/reduce
                  core.clj: 6868  clojure.core/reduce
                query.cljc: 1099  datahike.query$_q/invokeStatic
                query.cljc: 1097  datahike.query$_q/invoke
                query.cljc: 1234  datahike.query$raw_q/invokeStatic
                query.cljc: 1223  datahike.query$raw_q/invoke
                query.cljc:   80  datahike.query$q/invokeStatic
                query.cljc:   74  datahike.query$q/doInvoke
               RestFn.java:  439  clojure.lang.RestFn/invoke
     lookup_refs_test.cljc:  248  datahike.test.lookup_refs_test$fn__66025$fn__66058/invoke
     lookup_refs_test.cljc:  248  datahike.test.lookup_refs_test$fn__66025/invokeStatic
     lookup_refs_test.cljc:  186  datahike.test.lookup_refs_test$fn__66025/invoke
                  test.clj:  304  cider.nrepl.middleware.test/test-var/fn
                  test.clj:  303  cider.nrepl.middleware.test/test-var
                  test.clj:  293  cider.nrepl.middleware.test/test-var
                  test.clj:  336  cider.nrepl.middleware.test/test-vars/fn/fn/fn

What is the expected behaviour?

For the sake of consistency, I think this test should pass without throwing an exception when added to the test test-lookup-refs-query:

    (is (= (d/q '[:find ?e
                    :in $ ?e
                    :where [?e :friend 3]]
                  db "A")
             #{}))

The first step to make it pass would probably be to to remove the branch where :consts is accumulated in the function resolve-in.

How can the behaviour be reproduced?

See this commit on the branch jonasseglare/inconsistent-invalid-binding-demo for a unit test that demonstrates the issue. If you like I can open a PR from that branch so that we have this issue demonstrated in the source code.

Add this code to test-lookup-refs-query in lookup_refs_test.clj:

    ;; This test works, despite the fact that `?e` cannot be bound to `"A"`.
    (is (= (d/q '[:find ?e
                  :in $ [?e ...]
                  :where [?e :friend 3]]
                db ["A"])
           #{}))

   ;; This test currently throws an exception. Shouldn't it pass since the test above passes?
    (is (= (d/q '[:find ?e
                    :in $ ?e
                    :where [?e :friend 3]]
                  db "A")
             #{}))
whilo commented 10 months ago

That makes sense, I also think it is generally better to return nil than to throw errors for parametrized queries even though it is a bit concerning if you use the wrong type in the entity position, it is likely a bug from the user. So maybe throwing an error here is actually better for the user. Not sure. Either way it should be consistent.

whilo commented 7 months ago

As discussed this only occurs when someone makes an inconsistent query and therefore does not need a fix. It is nicer when the query engine just skips incorrect inputs than throwing exceptions.