tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.46k stars 304 forks source link

performance problem with a rule #151

Open freakhill opened 8 years ago

freakhill commented 8 years ago

[(chain ?a ?attr1 ?attr2 ?b) [?a ?attr1 ?i0 ?i0 ?attr2 ?b]]

I have a query that can use this rule 4 times but for which I currently manually expands every potential call.

Execution time when used: 0 rules 4 expands: ~hundred ms 1 rule 3 expands: ~hundred ms 2 rules 2 expands: ~90s 3 rules 1 expand ; JVM OOP

Any idea of what could cause that?

(datascript 0.15.0)

lvh commented 8 years ago

Could you post the expansions as well? (Some sample data would also be helpful, basically something to easily reproduce the problem)

neverfox commented 7 years ago

I'm having a similar problem with a recursive ancestor rule. The following query never returns, but if I leave off the last anc clause it returns almost instantly (though that does me no good because it cannot be guaranteed to be right). Also , if I'm dealing <= 4 tokens, the query is fast even if complete (i.e. all necesary anc clauses). I don't know why it would suddenly get hosed at 5 tokens since by the time it gets to that last clause, shouldn't it have narrowed down the range of possible bindings for ?p?

(def schema
  {:child      {:db/valueType   :db.type/ref
                :db/cardinality :db.cardinality/many
                :db/isComponent true
                :db/index       true}
   :token/text {:db/index true}})

(def rules
  '[[(anc ?par ?child)
     (?par :child ?child)]
    [(anc ?anc ?child)
     (?par :child ?child)
     (anc ?anc ?par)]])

(d/q '[:find ?p
            :in $ %
            :where
            [?t0 :token/text "central"]
            [?t1 :token/text "square"]
            [?t2 :token/text "of"]
            [?t3 :token/text "Djemma"]
            [?t4 :token/text "El-Fna"]
            [?t0 :token/index ?i0]
            [?t1 :token/index ?i1]
            [?t2 :token/index ?i2]
            [?t3 :token/index ?i3]
            [?t4 :token/index ?i4]
            [(- ?i1 ?i0) ?diff0]
            [(= ?diff0 1)]
            [(- ?i2 ?i1) ?diff1]
            [(= ?diff1 1)]
            [(- ?i3 ?i2) ?diff2]
            [(= ?diff2 1)]
            [(- ?i4 ?i3) ?diff3]
            [(= ?diff3 1)]
            [?p :sentence/index _]
            (anc ?p ?t0)
            (anc ?p ?t1)
            (anc ?p ?t2)
            (anc ?p ?t3)
            (anc ?p ?t4)]
          @conn rules)

Consider the following data:

[{:sentence/index 0
                      :child [{:parse/tag "ROOT"
                               :child [{:parse/tag "S"
                                        :child [{:parse/tag "NP"
                                                 :child [{:token/index 0
                                                          :token/text "central"}
                                                         {:token/index 1
                                                          :token/text "square"}
                                                         {:token/index 2
                                                          :token/text "of"}
                                                         {:token/index 3
                                                          :token/text "Djemma"}
                                                         {:token/index 4
                                                          :token/text "El-Fna"}]}]}]}]}
 {:sentence/index 1
                      :child [{:parse/tag "ROOT"
                               :child [{:parse/tag "S"
                                        :child [{:parse/tag "NP"
                                                 :child [{:token/index 0
                                                          :token/text "central"}
                                                         {:token/index 1
                                                          :token/text "square"}
                                                         {:token/index 2
                                                          :token/text "of"}
                                                         {:token/index 3
                                                          :token/text "Foo"}
                                                         {:token/index 4
                                                          :token/text "El-Fna"}]}]}]}]}]

If you try the query above, it will run with anc clauses through ?t1 but hangs if you add ?t2 or beyond.

This hangs as you might expect:

(d/q '[:find ?t0 ?t1 ?t2
                                  :in $ %
                                  :where
                                  [?p :sentence/index _]
                                  (anc ?p ?t0)
                                  (anc ?p ?t1)
                                  (anc ?p ?t2)]
                                @conn rules)

So that suggests to me that the bindings are not narrowing from the top down and the hang is because those anc clauses are running independently (to be bound after it completes).

neverfox commented 7 years ago

Wait, is this because, like Datomic, variables need binding at invocation time through the use of [ ] in the rule definition?

tonsky commented 7 years ago

Wait, is this because, like Datomic, variables need binding at invocation time through the use of [ ] in the rule definition?

No, DataScript doesn’t support that. Also I believe it’s just validation check in Datomic, not an actual perf optimization

neverfox commented 7 years ago

I see. Thanks. If you do get a chance, I would appreciate your insight on my example. To get around it, I basically had to run the query without

[?p :sentence/index _]
(anc ?p ?t0)
(anc ?p ?t1)
(anc ?p ?t2)
(anc ?p ?t3)
(anc ?p ?t4)

returning tuples of ?t bindings and then loop through the set of tuples using the above as the where clause. It just doesn't like doing it all at once.