open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.57k stars 1.33k forks source link

Improve evaluator to detect and perform early-exit if possible #2092

Closed tsandall closed 2 years ago

tsandall commented 4 years ago

Currently the evaluator does not implement any kind of "early-exit" (or short-circuit or find-one) behavior that stops evaluation when results are no longer needed. For example, given a rule set where all of the rules return the same value (e.g., allow = true { ... }), OPA will evaluate every rule returned by the indexer. In some cases, it would be simple enough to detect and stop evaluation of subsequent rules once one result had been found. A simple heuristic to start would be complete rules where that assign the same value (under different conditions.) For example:

default allow = false

allow = true { # logic }
allow = true { # logic }
allow = true { # logic }
allow = true { # logic }
...

Similarly, the evaluator should also be able to short-circuit any unnecessary backtracking in the case of iteration on data:

allow { input.groups[_] == "admin" }

If a single match is found, no backtracking needs to be performed.

For debug/test purposes, there should be a flag that lets users disable this behavior.

Note this was raised long ago in #234 but never got implemented.

tsandall commented 4 years ago

This was raised in the recent GK security audit.

srenatus commented 3 years ago

The issue of side effects was brought up: technically, any one of the allow { # logic } rule bodies could do an http.send(). Short-circuiting the evaluation could make those go away.

Similarly, the evaluator should also be able to short-circuit any unnecessary backtracking in the case of iteration on data

šŸ‘ Those would be safe cases: anything iterated on from the base documents (the base doc portion of data, and all of input) would be OK to short-cut, there cannot be any side effects there.

For general short-cutting the evaluation of virtual document rules, a new keyword mirroring every (#2090) could be introduced:

allow {
  any x { input.container[x] }
  startswith(input.container[x].image, "example.com")
}

It would have the effect that after the first x was bound successfully, and the rule body expressions evaluated successfully, too, no further attempts would be made.

ā˜ļø Picked "any" here for illustration purposes, and because another some would probably be confusing.

tsandall commented 2 years ago

@srenatus I've filed an issue to improve the docs around http.send because I don't think we should have to assume that http.send has side effects--it should absolutely not be relied on for that. AFAIK, http.send is the only case that we have today like that--so if we assume everything is side effect free, we don't have to introduce any new syntax into the language.

If we make this assumption then the evaluator just has to know when it can short-circuit. Evaluation of complete docs where the value is constant is the simplest case though I could imagine others:

I'd think the analysis could be implemented in the compiler and the result could be cached. When the interpreter looks up rules during complete doc evaluation, it could check the compiler data structure to see if early-exit is allowed. If early-exit is allowed, it could stop iterating once a result was found.

Perhaps the early-exit bit could be stored on the rule index lookup result. That would work for both cases of early exit (i.e., early exit on rule sets and early exit on iteration within a single rule.)

srenatus commented 2 years ago

AFAIK, http.send is the only case that we have today like that--so if we assume everything is side effect free, we don't have to introduce any new syntax into the language.

Agreed! Let's add that warning and assume purity.

srenatus commented 2 years ago

šŸš§ #3898

srenatus commented 2 years ago

Membership tests on partial sets and set comprehensions

We could do this in a follow-up, but for my understanding -- I currently think that while complete rules and functions are pretty analogous (wrt. their code changes), these would be different.... šŸ¤” But I also don't have a clear idea what we're aiming for here, so an example/sketch would be appreciated.

srenatus commented 2 years ago

Thoughts on the partial set membership early exit: when we have multiple s[x] { ... } rules, and a query for a non-specific element, i.e.

p {
  s[x]
  x == 10
}

we could early exit after the first binding of x to a value that's making the overall rule body true, but we've made the evaluator too clever for that: seeing s[x], it'll evaluate the entire set, through the cache hinting introduced in #3492.

srenatus commented 2 years ago

First steps have been merged šŸŽ‰ #3898

I'll create a few follow-up tickets, and reference them here.

srenatus commented 2 years ago

šŸ‘‰ #4035 šŸ‘‰ #4036 šŸ‘‰ #4037

tsandall commented 2 years ago

I'm going to close this issue now that we have landed early-exit support on functions and complete rules.