tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.45k stars 304 forks source link

Indexes on vectors return surprising results #470

Open mainej opened 4 months ago

mainej commented 4 months ago

I'm seeing some odd behavior with indexed vectors. I thought this might have been introduced in 1.6.4, but it was also happening in 1.6.3. Given

(let [db (-> (d/empty-db {:path {:db/index true}})
             (d/db-with [{:path [1 2]}
                         {:path [1 2 3]}]))]
  (for [v [;; variations on 1, 2
           [1 2]
           (list 1 2)
           (butlast [1 2 3])

           ;; variations on 1, 2, 3
           [1 2 3]
           (list 1 2 3)
           (butlast [1 2 3 4])]]
    [v
     (->> (d/datoms db :avet :path v)
          (mapv :e))]))

I'd expect this to only find entity ids for the first and fourth values of v, i.e. for the vectors, not the lists or sequences. However, this is what is returned.

;; => ([[1 2] [1]]
;;     [(1 2) [1]]
;;     [(1 2) []]
;;     [[1 2 3] [2]]
;;     [(1 2 3) []]
;;     [(1 2 3) [2]])

It's particularly odd that first list, (list 1 2) returns an entity id but (list 1 2 3) doesn't. And contrarily that the first sequence (butlast [1 2 3]) doesn't return an entity id but (butlast [1 2 3 4]) does.

As I was playing with this I also ran across another odd variation. The presence of other datoms seems to affect the result.

(let [db (-> (d/empty-db {:path {:db/index true}
                          ;; this attribute is new
                          :children {:db/valueType :db.type/ref
                                     :db/cardinality :db.cardinality/many}})
             (d/db-with [{:db/id "a"
                          :path [1 2]}
                         {:db/id "b"
                          :path [1 2 3]}
                         ;; this datom is new
                         [:db/add "a" :children "b"]]))]
  (for [v [;; variations on 1, 2
           [1 2]
           (list 1 2)
           (butlast [1 2 3])

           ;; variations on 1, 2, 3
           [1 2 3]
           (list 1 2 3)
           (butlast [1 2 3 4])]]
    [v
     (->> (d/datoms db :avet :path v)
          (mapv :e))]))

;; => ([[1 2] [1]]
;;     [(1 2) [1]]
;;     this result has changed
;;     [(1 2) [1 2]]
;;     [[1 2 3] [2]]
;;     [(1 2 3) []]
;;     [(1 2 3) [2]])

Here the result is almost the same, except that whereas before when v was (list 1 2), d/datoms returned nothing. Now, with the other datom, it returns both entity ids. Very odd!

The easiest fix is, on the calling side, to be careful to pass vectors not sequences. That's what we'll do in our project, but I wanted to report the issue anyway.

tonsky commented 3 months ago

Thanks! Should be better now