Revisiting condition binding analysis

Problem statement

Currently, analyze-condition takes conditions, including arbitrarily nested ones, and returns information on bindings that the condition creates and ones that it needs to consume from a preceding condition. This information is used to sort conditions into the order in which they will appear in the rules network. I believe this functionality currently has a number of shortcomings and that we need to rethink it in order to resolve them.

In the case of an :or condition, since we iterate over the leaf conditions produced by to-dnf and then return the union of all variables bound in any leaf the variable data returned may not be valid for every branch of the :or. For example:

(defquery myquery "" []
                    [:or
                     [:Type1 (= ?x (:field this))]
                     [:Type2 (= ?y (:field this))]])

clara.test-rules> (-> myquery :lhs first com/analyze-condition)
{:bound #{?x ?y},
 :unbound #{},
 :condition
 [:or
  {:type :Type1, :constraints [(= ?x (:field this))]}
  {:type :Type2, :constraints [(= ?y (:field this))]}],
 :is-accumulator false}

This poses problems if it causes this condition to be sorted before conditions that need to consume ?x or ?y since it will only create one of ?x or ?y, not both.

Or take a more complicated case, where each branch of an :or needs to consume a binding, but a different binding in each branch:

 (defquery myquery "" []
                    [:or
                     [:Type1 (= ?x (:field this))
                      (some-predicate? ?y this)]
                     [:Type2 (= ?y (:field this))
                      (some-predicate? ?x this)]])
clara.test-rules> (-> myquery :lhs first com/analyze-condition)
                      {:bound #{?x ?y},
                       :unbound #{},
                       :condition
                       [:or
                        {:type :Type1,
                         :constraints [(= ?x (:field this)) (some-predicate? ?y this)]}
                        {:type :Type2,
                         :constraints [(= ?y (:field this)) (some-predicate? ?x this)]}],
                       :is-accumulator false}

I don't see any way to do a topological sort on something where there are multiple possibilities for what sets of bindings must be consumed from predecessors and which can be created.

Furthermore, it is possible that nested conditions need to be sorted with respect to each other, and we're only doing a sort on the top level. This isn't exactly a bug since it won't cause problems if the user arranges conditions correctly so that all variables are bound before they are consumed, but it does mean we're not really agnostic to ordering of expressions currently.
In the case of :not conditions, the to-dnf transformation performed in analyze-condition is invalid for determining bindings for the same reason that we moved away from it in issue 149. Take the following example:

(-> (dsl/parse-query []
                        [[:not [:and
                            [:x (= ?z (:z this))]
                            [:y (= ?z (:z this))]]]])
                      :lhs first com/to-dnf)
(:or
    [:not {:type :x, :constraints [(= ?z (:z this))]}]
    [:not {:type :y, :constraints [(= ?z (:z this))]}])

This will then fall into the :or branch of the leaf condition analysis in analyze-condition and the :not conditions will be iterated over. Since they are :not conditions, they will not be considered eligible to bind variables. However, their variables will still be considered unbound. As a result, ?z will be considered an unbound variable of the parent negation.

clara.test-rules> (-> my-neg-query :lhs first com/analyze-condition)
{:bound #{},
 :unbound #{?z},
 :condition
 [:not
  [:and
   {:type :x, :constraints [(= ?z (:z this))]}
   {:type :y, :constraints [(= ?z (:z this))]}]],
 :is-accumulator false}

Since there is an unbound variable that isn't bound by another condition in the query, sorting its conditions will throw an exception.

 clara.test-rules> (-> my-neg-query :lhs com/sort-conditions)
ExceptionInfo Using variable that is not previously bound. This can happen when an expression uses a previously unbound variable, or if a variable is referenced in a nested part of a parent expression, such as (or (= ?my-expression my-field) ...).
Note that variables used in negations are not bound for subsequent
                                    rules since the negation can never match.
Production:

Unbound variables: #{?z}  clojure.core/ex-info (core.clj:4593)

Proposed solution

I propose that:

We reimplement condition analysis on boolean expressions to walk the tree of expressions, analyzing the bindings at each level and taking information on available bindings from preceding and ancestor expressions and returning newly bound bindings to inform analysis of subsequent expressions. This would contrast with the current approach of attempting to coerce conditions into a flat structure. Since we would be able to understand the bindings necessary at all levels of the tree of boolean expressions we would have the necessary information for constraining on bindings in a fix for issue 343.
A boolean condition will be considered to bind a variable only if every possible path through its descendants binds that variable. It will be considered to have an unbound variable if any possible path through is descendants requires that variable to come from a predecessor condition. To clarify the notation:

[A (= ?1 ...)]
[:or [B (= ?1 ...)]
      [C  (= ?2 ....)]]

The "A" condition is a "predecessor" of the :or condition, and the B and C conditions are both descendants of the :or condition. The effect here is that, viewing the top-level conditions as "black boxes" without concern for their contents, their bound variables will be all variables that are certain to be bound if the condition matches and the unbound variables are all variables for which there is any way the condition would require them.

As a result of the constraints in the point above, I believe we could continue to sort conditions at the top level as currently. We would, however, be unable to sort conditions inside a top-level boolean condition. This is a current shortcoming. We could explore ways to do this at another time, but I haven't come up with any good algorithms for how to do this, since the ordering of nested conditions would impact the ordering of sibling conditions of their ancestor.
We would be breaking an existing pattern where the user could just expect bindings that might not be populated to simply be nil. For example:

(defrule nil-binding-rule
                    [:or
                     [First (= ?x 1)]
                     [Second (= ?y 2)]]
                    =>
                    (println "X: " ?x " --- " "Y: " ?y))
clara.test-rules> (-> (mk-session [nil-binding-rule]) (insert (->First)) fire-rules)
X:  1  ---  Y:  nil
#object[clara.rules.engine.LocalSession 0x7ac42172 "clara.rules.engine.LocalSession@7ac42172"]
clara.test-rules> (-> (mk-session [nil-binding-rule]) (insert (->Second)) fire-rules)
X:  nil  ---  Y:  2
#object[clara.rules.engine.LocalSession 0x5dc53c9d "clara.rules.engine.LocalSession@5dc53c9d"]

This seems like an accident of implementation that doesn't make much sense to me as a rule-writing pattern though. However, we could support patterns where a binding might not be created from a fact condition. I'm envisioning something like the following:

[?info <- [OtherInformation]]
[?newest <- (newest-fact-accum) :from [SensorReading]]
[:let [?sensor-reading-calculation (.... ?newest ... ?info ....)]]

IMO that kind of notation has a clear and intuitive meaning, especially since core Clojure construct such as for, doseq, etc. use the same notation. I chose an accumulator example since creating bindings for later use from an accumulator result is a pattern that isn't easily supported currently, but you could use such a construct in other ways as well e.g. to explicitly create a nil binding for a binding missing in one branch of an :or condition. My initial instinct wouldn't be to block these changes on this enhancement since the existing patterns it would support are strange and probably not common, but I could be convinced otherwise or change my mind on further thought.

Issues that this approach solves or unblocks

Issue 357
Issue 347
Issue 343
Issue 329
The issue above with bindings potentially being nil (nothing specific logged yet)

Related issues not solved

Issue 218: This is the one about ordering of accumulator conditions when either one could potentially create a binding. If you try to allow either ordering in nested accumulator conditions, particularly in negations since you can't use a DNF transform reliably, it seems to me that you'd run into the same problems as with sorting nested non-accumulator conditions relative to each other. We may need to say that the latter case is not supported. However, for the top level, I don't see any reason why the use of the alternative analyze-condition implementation would inhibit the use of the solution described at https://github.com/cerner/clara-rules/issues/102#issuecomment-229105450.

I think the goal here should be to establish a logical framework in which our various oddities in the handling of boolean expressions in rules can be resolved rather than simply solving a single case, although implementation may come in stages. The axioms around bound vs unbound variables I described seem consistent to me, but if anyone can poke holes in them I'd rather deal with any inconsistencies I'm not considering now than after implementing a significant change to the compiler like this. Similarly, I'd be interested to hear any thoughts on potential negative impacts on rule-writing patterns of these proposals.

Some background on the current behavior. We do currently only sort the top level, since that's the only level where users (or generating code) specifies a collection of conditions, as opposed to an explicit boolean expression as done in children. I liked the property of the rule engine being order independent in general, so the process generating a collection of conditions wouldn't need to be concerned with their ordering. That said, I'm not necessarily opposed to weakening this guarantee if it's creating troublesome corner cases. I don't think attempting to sort nested conditions is really worthwhile, since they are an always a structured expression rather than a bag of conditions.

On another topic: the use of nil for un-bindable values was a conscious decision because I felt that it was the Least Surprising Behavior to someone not familiar with the system. Copying the example from above, I think a new user who sees the below post will expect it to compile and the unbound value to be nil, rather than throwing some "Unable to bind value" exception. I think there is precedent for the current behavior in other engines as well.

(defrule nil-binding-rule
   [:or [First (= ?x 1)]
          [Second (= ?y 2)]]
   =>
  (println "X: " ?x " --- " "Y: " ?y))

Of course, my interpretation of Least Surprising Behavior could be debated here.

I do think the central idea of this proposal of reducing the amount of transformation we do with nested expressions has merit, particularly in the binding issues seen in the transformation of [:not [:and ...]] -type structures. So I might lean towards something like this:

We can abandon some or all condition sorting if there isn't a clean way to keep it while resolving these issues
Any time we encounter a negation we immediately extract that into its own rule with a NegatedResult, prior to any processing, and do so recursively.

I think this covers most of the unbound errors described here.

@rbrush

I liked the property of the rule engine being order independent in general, so the process generating a collection of conditions wouldn't need to be concerned with their ordering

Maybe I should know this, but I can't think of what reason there is to re-order the conditions as we do now. Do you have a specific reason?

I know that if the rule was written in way that a binding was used prior to being unified to something like:


(defrule rule
  [B (< ?x y)]
  [A (= ?x x)]
  =>
  :foo)

Then the order that the network propagated should have these conditions reversed. However, to me, I always considered the above rule to be written incorrectly.

I've actually never considered rules written in the "wrong order" like this example to be valid. I haven't even tried to write them that way much to test out that theory.

A separate point that relates to this discussion:

We have had it come up a few times in Slack discussions, etc, about Clara potentially being a bit "lazier" in how it evaluates constraints. I think that in particular this is in reference to the alpha network evaluation.

Consider the example:

(defrule r1
  [A (= ?x x)]
  [B (= ?x x) (expensive-thing y)]
  =>
  (insert! (->C)))

In this case, if only facts of type B were inserted (no A facts), the expensive-thing function would be executed anyways. This could lead to a large waste of time if there were a lot of B and perhaps never any A.

You can refactor the rule manually as:

(defrule r1
  [A (= ?x x)]
  [B (= ?x x) (= ?y y)]
  =>
  (insert! (->ABJoin ?x ?y)))

(defrule r2
  [ABJoin (expensive-thing y)]
  =>
  (insert! (->C)))

However, that isn't always obvious. Also, a more complex rule is harder to break up like that.

Note that this lazier evaluation of conditions idea is implemented in Drools 6+ (the Phreak algorithm as they call it). There is more complexity in how it is done there, but the general idea is similar.

To keep some record of conditions that have prior on the appropriate network nodes. These become the "dependency" conditions for a node. With that information, the alpha network (or anywhere else that it applies) could delay execution until all dependencies were satisfied. The caveat to this is that all facts would need to be held that may still match the unevaluated logic. I think this would result in the trade-off of the alpha network not filtering as many facts as it possibly could, for the sake of execution time.

In the presence of condition re-ordering and/or rule conditions being written in no particular order, I think something like "lazier" evaluation becomes more difficult.

the use of nil for un-bindable values was a conscious decision because I felt that it was the Least Surprising Behavior to someone not familiar with the system.

I've thought that this was a reasonable idea as well. When using an :or in a rule, there is more utility in it if you can use bindings from each branch. I think it is more confusing to get "unbound variable" errors. Binding to nil has a idiomatic Clojure "feel to it".

Any time we encounter a negation we immediately extract that into its own rule with a NegatedResult, prior to any processing, and do so recursively.

This seems like it would be the simplest solution to the problem with negation condition extraction. It doesn't make sense to do analysis on the conditions prior to extraction since analysis does a to-dnf on these conditions, which is invalid in the first place.

Edit: the rest of this was irrelevant so I updated the comment.

@WilliamParker

I chose an accumulator example since creating bindings for later use from an accumulator result is a pattern that isn't easily supported currently

I didn't understand this I don't think. Accumulator result bindings have been used in subsequent conditions quite a bit in rules I've seen. Are you referring to cases like #357 ? It seems strange to me that a user wouldn't be able to expect a rule like you have in that example to just work, without this :let construct etc.

@WilliamParker In your proposal, the first bullet that starts with

We reimplement condition analysis on boolean expressions to ..etc..

I don't understand how it relates to #343 , which is about negation conditions. As discussed in above, if all negations are (recursively) extracted prior to analysis, they will no longer show up during the analysis of conditions phase (not as nested negations that is, they'll be their own group of separate conditions).

Alright, I'm fine with dropping our top-level topological sort. There is one additional case where it was helpful, but we can probably solve it another way. Consider this rule:

(defrule accum-with-nothing    
    [?all-a <- (acc/all) :from [A (=?x x)]]
    [B (= ?x x))]
    =>
   (do-something-with ?x))

The current sorting behavior pushes the accumulator to be below [B (= ?x x))]. That way ?x will be bound even if the accumulator didn't match anything (in which case ?all-a is an empty sequence). If the accumulator is run first it may or may not bind ?x -- since acc/all will still activate if ?all-a is empty -- so we'd need to handle the possibility of valid-but-unbound ?x from parent nodes in the network.

So we'll need to handle that edge case if we drop the sort, but otherwise I think it'll be okay.

@rbrush Thanks for the example in https://github.com/cerner/clara-rules/issues/373#issuecomment-357256518

I see how that could be tricky. I'm not sure what the semantics should be there. If the accumulator created a result, ?all-a from no facts, then ?x doesn't have "meaning" for other conditions yet. So should the B condition be able to evaluate or should it prevent the rule from being satisfied?

e.g. If you inserted only B type facts, should the rule fire with ?x bound to whatever B happens to have on each match?

In the case of = that may make sense. However, what about:

(defrule accum-with-nothing    
    [?all-a <- (acc/all) :from [A (=?x x)]]
    [B (< ?x y))]
    =>
   (do-something-with ?x))

Here, the < test would fail on nil.

I wonder if this issue is fully isolated to accumulators. I think it is since they can allow propagation with no facts, and therefore not "respect their bindings".

Regarding the discussion about accumulator bindings, I thought we'd landed on semantics on that back in issue 102, particularly the examples from @rbrush in this comment and some discussion of the axioms in my comment here. I don't personally see a need to revisit those semantics, although we can of course discuss it. In my observation they've proven reasonably easy to understand.
Regarding the relation of this issue to issue 343, the reason I think they're related is so that full information on the bindings available and used at each level of nesting is available to implement constraints on the [:not [NegationResult]] condition. Basically how to ensure that the problem in issue 304 doesn't occur with the additional extraction logic that solving issue 343 would involve. Right now I don't think we generate the necessary information.. but I haven't looked at it closely enough to be quite certain.
Regarding https://github.com/cerner/clara-rules/issues/373#issuecomment-357247436 see issue 375

I can see an argument for having nil bindings not present in one branch just be nil when the user tries to use them when not present if we're talking about usage in the RHS. My concern is more around logical oddities that arise when a condition is sorted as being able to create variables, but actually doesn't, and later conditions actually dispatch on and can actually create that variable. I created one such scenario in issue 374. Perhaps we could create some idea of "things that definitely create a variable get sorted above those that might" or of sorting after the transformation into DNF is complete rather than before. I'll have to give that more thought.

So, some more thoughts:

The more I think about this, the more I wonder if the simplest thing to understand regarding sorting is simply to respect the order the user provides, in all cases. It may be inconvenient in some cases, as in the one Ryan described here, but has the advantage of being a fundamentally simple rule to communicate to users. In that particular case, there is a simple reordering that is arguably intuitive, but there are other cases where it isn't so clear how one would reorder conditions. Trying to establish rules for when Clara will reorder conditions and it won't seems like it might be a more complicated pattern for the user to understand than just understanding that they always need to order rules properly. Keep in mind that the reordering here does actually impact semantics; it isn't just a performance concern due the changes made in issue 102. When we're dealing with arbitrary nesting having a rigorous idea of "bound elsewhere" in the rule seems like a very slippery concept.

One example of a more complicated case:

[:or [?1 <- (acc/all) :from [A (= ?2 field)]]
     [?2 <- (acc/all) :from [A (= ?1 field)]]]
[?1 <- (acc/all) :from [C (= ?2 field)]]

Note that we could still apply performance optimizations like moving accumulators down in the rule conditions in many cases if we wrote logic to detect cases where doing so would be safe, which in practice is probably the majority of cases. That said, it would be a breaking change, which I'm not excited about.. but if it is the best option and will get us on a logically sound footing for the future might be worth it. FWIW, I can't recall communications or docs suggesting this as a pattern, and I suspect most of the time people would write this with the accumulator below the bindings it consumes since this is a more logically "top-to-bottom" flow IMO.

Having something like this populate nil bindings seems reasonable to me:

(defrule nil-binding-rule
   [:or [First (= ?x 1)]
          [Second (= ?y 2)]]
   =>
  (println "X: " ?x " --- " "Y: " ?y))

The troublesome aspect of this to me is what happens when ?x or ?y is used in subsequent conditions. For example, in the case described in issue 374. I'm toying with the idea of allowing users to use this sort of pattern in the RHS, but restricting when this pattern can be used in the LHS to cases where we can reliably determine a clear logical meaning. One possibility that has occurred to me is preventing a later condition from "creating" either ?x or ?y, although this is a pretty broad restriction and perhaps we could come up with something more narrow. Maybe even just restrict the "maybe-create, then use, then create" pattern in issue 374. I'd be interested in others thoughts on such possibilities.

oracle-samples / clara-rules