replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.62k stars 95 forks source link

[Bug]: `pull-many` query with 3 attr-ids on a range of 500 entities takes ~2,900 ms #652

Closed chilip3pp3r closed 10 months ago

chilip3pp3r commented 10 months ago

What version of Datahike are you using?

0.6.1531

What version of Java are you using?

openjdk 18.0.1.1

What operating system are you using?

MacOS

What database EDN configuration are you using?

{:store {:backend :jdbc 
         :dbtype "postgresql"
         :host "localhost"
         :port 5432}}

Describe the bug

This question was also posted to StackOverflow and Datahike channel of Clojurians slack.

Performing the following pull-many query with 3 attr-ids (including :db/id) on a range of 500 or so entities requires ~2,900 ms:

(require '[datahike.api :as d]) ; version 0.6.1531
(d/pull-many @conn [:db/id :book-name :notable?] 
  (range 1 500))

The schema is as follows:

[{ :db/ident :book-name
   :db/valueType :db.type/string
   :db/cardinality :db.cardinality/one}
 { :db/ident :notable?
   :db/valueType :db.type/long
   :db/cardinality :db.cardinality/one}]

Is the slow query time an inherent trade-off of EAV databases, or am I failing to optimize in some very obvious way?

What is the expected behaviour?

Much faster query speed? (Apologies.)

How can the behaviour be reproduced?

Query:

(require '[datahike.api :as d]) ; version 0.6.1531
(d/pull-many @conn [:db/id :book-name :notable?] 
  (range 1 500))

Schema:

[{ :db/ident :book-name
   :db/valueType :db.type/string
   :db/cardinality :db.cardinality/one}
 { :db/ident :notable?
   :db/valueType :db.type/long
   :db/cardinality :db.cardinality/one}]