threatgrid / asami

A graph store for Clojure and ClojureScript
Eclipse Public License 1.0
636 stars 29 forks source link

Expand predicate resolution to handle sets #119

Open quoll opened 3 years ago

quoll commented 3 years ago

This is to allow the following patterns in queries:

[?a #{:m :n} ?b] This will be equivalent to: (or [?a :m ?b] [?a :n ?b])

Alternatively: [?a #{:m :n}+ ?b] This will track a path of either :m: or :n across a graph. There is currently no query equivalent to this. It can be done by updating index lookups for a predicate into mapcat operations across the predicates.

noprompt commented 3 years ago

I suggest splitting this ticket into two tickets: this one for the first portion, [?a #{:m :n} ?b] patterns, and another for the second, [?a #{:m :n}+ ?b] patterns. Implementing the first portion amounts to adding a new step to the parse/expand pipeline (see GH-124), theres not much to it. The second portion requires a bit more effort because it needs to look ahead but I also have some thoughts I think are worth considering.


It would be nice if (+ pattern) was a proper predicate pattern in the language. This would allow for a very clean set of rewrite rules and we could add it without breaking existing syntax. Consider the following.

Distribute + over set predicate

[?a (+ #{p1 p2}) ?b]
;; => DistributePlus
[?a #{(+ p1) (+ p2)} ?b]
;; => ExpandSetPredicate
(or [?a (+ p1) ?b]
    [?a (+ p2) ?b])

Desugar keywords/symbols (allows existing syntax to remain)

[?a :m+ ?b]
;; => DesugarKeyword
[?a (+ :m) ?b]

[?a ?m+ ?b]
;; => DesugarSymbol
[?a (+ ?m) ?b]

+ is idempotent

[?a (+ (+ p)) ?b]
;; => IdempotentPlus
[?a (+ p) ?b]

Rewriting postfix +

[?a p + ?b]
;; => PostfixPlus
[?a (+ p) ?b]

Example application of rules

[?a #{:m+ :b+} + ?b]
;; => PostfixPlus
[?a (+ #{:m+ :b+}) ?b]
;; => DistributePlus
[?a #{(+ :m+) (+ :b+)} ?b]
;; => DesugarKeyword, DesugarKeyword
[?a #{(+ (+ :m)) (+ (+ :b))} ?b]
;; => IdempotentPlus, IdempotentPlus
[?a #{(+ :m) (+ :b)} ?b]
;; => ExpandSetPredicate
(or [?a (+ :m) ?b]
    [?a (+ :b) ?b])

These rules could also be applied for *.

We also need the following rules.

+ over *

[?a (+ (* p)) ?b]
;; => PlusStar
[?a (* p) ?b]

* over +

[?a (* (+ p)) ?b]
;; => StarPlus
[?a (+ p) ?b]

Set absorption

[?a #{#{p} q} ?b]
;; => AbsorbSet
[?a #{p q} ?b]

Note, this rule implied by

[s #{#{p} q} o]
;; => ExpandSet
(or [s #{p} o]
    [s q o])
;; => ExpandSet
(or [s p o]
    [s q o])
quoll commented 3 years ago

I suggest splitting this ticket into two tickets: this one for the first portion, [?a #{:m :n} ?b] patterns, and another for the second, [?a #{:m :n}+ ?b] patterns. Implementing the first portion amounts to adding a new step to the parse/expand pipeline (see GH-124), theres not much to it. The second portion requires a bit more effort because it needs to look ahead but I also have some thoughts I think are worth considering.

While there needs to be some support in the parser, I really do want the postfix * and + syntax. There's no reason why they can't both be done (indeed, it might be nice to convert #{:m :n}+ into (+ #{:m :n})). However, while syntax support is necessary this is not something that can be done with term rewriting.

It would be nice if (+ pattern) was a proper predicate pattern in the language. This would allow for a very clean set of rewrite rules and we could add it without breaking existing syntax. Consider the following.

Distribute + over set predicate

[?a (+ #{p1 p2}) ?b]
;; => DistributePlus
[?a #{(+ p1) (+ p2)} ?b]

While I think this can be interpreted to have a valid semantic, it's odd. It would definitely need to be transformed back into the first expression in order to be processed.

;; => ExpandSetPredicate
(or [?a (+ p1) ?b]
    [?a (+ p2) ?b])

This is not an equivalent expression.

Desugar keywords/symbols (allows existing syntax to remain)

[?a :m+ ?b]
;; => DesugarKeyword
[?a (+ :m) ?b]

[?a ?m+ ?b]
;; => DesugarSymbol
[?a (+ ?m) ?b]

These work for me.

+ is idempotent

[?a (+ (+ p)) ?b]
;; => IdempotentPlus
[?a (+ p) ?b]

I'm more inclined to treat this as a syntax error. (Syntax, rather than semantic, since it can be triggered by [?a :p++ ?b])

Rewriting postfix +

[?a p + ?b]
;; => PostfixPlus
[?a (+ p) ?b]

Example application of rules

[?a #{:m+ :b+} + ?b]

This doesn't make sense as an initial statement, meaning that the subsequent transformations don't have meaning.

Set absorption

[?a #{#{p} q} ?b]
;; => AbsorbSet
[?a #{p q} ?b]

Assuming that this was syntactically allowed, then yes, but I don't see it.

noprompt commented 3 years ago

My initial understanding of the set predicate syntax expansion coupled with the + operator was confused. @quoll and I had a conversation and now I understand the syntax a bit more: it expresses alternation as outlined here. In summary, the attribute-element syntax, assuming we allow both post and prefix notation of + and * is approximately

attribute-element :=
    element[ '*' | '+' ]
  | '#{' element  + '}' [ '*' | '+' ]
  | '(' ( '*' | '+' ) ( element | '#{' element + '}' ) ')'

The first two forms of attribute-element can be desugared into the third with

'(' ( '*' | '+' ) element ')'

forms being convertible to

'(' ( '*' | '+' ) '#{' element '}' ')'
noprompt commented 3 years ago

I'm more inclined to treat this as a syntax error. (Syntax, rather than semantic, since it can be triggered by [?a :p++ ?b])

Consider for a moment that if we allow the grammar to be, approximately,

star-or-plus := ( '*' | '+' )

prefix-attribute-element :=
    '(' star-or-plus prefix-attribute-element-argument ')'

prefix-attribute-element-argument :=
    element[ star-or-plus + ]
  | '#{' element + '}'
  | prefix-attribute-element

attribute-element :=
    element[ star-or-plus + ]
  | '#{' element  + '}' [ star-or-plus + ]
  | prefix-attribute-element

and reduce

'(' star-or-plus '#{' element '}' ')'

to

'(' star-or-plus element ')'

then :m++, (+ (+ :m)), etc. could be legal syntax and the proposed idempotent, (* (+ prefix-attribute-element)), and (+ (* prefix-attribute-element)) rules could be applied.

:m++
;; =>
(+ :m+)
;; =>
(+ (+ :m))
;; =>
(+ :m)
(+ #{:m}) +
;; =>
(+ (+ #{:m}))
;; =>
(+ (+ :m))
;; =>
(+ :m)

While many of these forms are unlikely to be written by hand, they are likely to be programmatically generated. In this case, I think if the grammar could be expanded to include these forms and a reasonable semantic given to them, then it would shift burden away from consumers constructing queries programmatically.

quoll commented 3 years ago

This ticket is focusing on the syntax and semantics of query forms. While a new query form is required (specifically, sets as predicates), this ticket is to implement resolutions of patterns where the predicate (or attribute) of a pattern is a set.

I'm concerned that some of the discussion is implying that some of this work could or should be done in query algebra. However, this would cause issues for efficiency when managing standard patterns, as well as leading to managing the sets at multiple abstraction levels since transitive patterns can only be handled by the resolver.

We should open a new ticket on query syntax, and link it to the above discussion.