Recursion or transitive closure use case: supergroups

open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.

https://www.openpolicyagent.org

Apache License 2.0

9.71k stars 1.35k forks source link

Recursion or transitive closure use case: supergroups #947

Closed eamsden closed 4 years ago

eamsden commented 6 years ago

I have some groups, some of which specify another group as their supergroup:

# auth/groups/data.json
[ { "gid": 0
   , "name": "$corp admin"
   , "members": ["sam", "fred", "lisa"]
  }
, { "gid": 1
   , "name": "client 1 admin"
   , "members": "lucy, jim"
   , "supergroup": 0
   }
, { "gid": 2
  , "name": "client 1 users"
  , "members": "steve, ralph"
  , "supergroup": 1
  }
]

I want to write a policy such that if a group has read permission on an entity, then their supergroup has read, write, and delete permission on that entity.

package auth.policies
import data.auth.groups

superPermissions = {"read","write","delete"}

groupHasPermission[[group,entity,permission]] {
  group = groups[_]
  subgroup = groups[_]
  subgroup.supergroup = group.gid
  groupHasPermission[[subgroup,entity,"read"]] # Error: recursive!
  permission = superPermissions[_]
}

This would be very difficult to encode clearly without recursion. Perhaps transitive closure would work, though I haven't thought how that might work yet.

srenatus commented 6 years ago

💭 I wonder if in some use cases, restricting the level of nesting could make sense. If so, we should be able to create simple rules enumerating the separate levels of nesting -- see this simplified example:

data.json:

{
"$corp admin": {
"members": ["sam", "fred", "lisa"]
},
"client 1 admin": {
"members": ["lucy", "jim"],
"supergroup":  "$corp admin"
},
"client 1 users": {
"members": ["steve", "ralph"],
"supergroup":  "client 1 admin"
}
}

pol.rego:


package auth.policies
import data.auth.groups

member[[group_id, user]] { groups[groupid].members[] = user # direct member } { groups[groups[groupid].supergroup].members[] = user # one level } { groups[groups[groups[groupid].supergroup].supergroup].members[] = user # two levels }

- `opa run auth.groups:data.json pol.rego`:
```console
OPA 0.9.3-dev (commit 73d46a85-dirty, built at 2018-10-04T07:02:04Z)
Run 'help' to see a list of commands.
> data.auth.policies.member[["client 1 users", u]]
+---------+------------------------------------+
|    u    | data.auth.policies.member[["client |
|         |           1 users", u]]            |
+---------+------------------------------------+
| "steve" | ["client 1 users","steve"]         |
| "ralph" | ["client 1 users","ralph"]         |
| "lucy"  | ["client 1 users","lucy"]          |
| "jim"   | ["client 1 users","jim"]           |
| "sam"   | ["client 1 users","sam"]           |
| "fred"  | ["client 1 users","fred"]          |
| "lisa"  | ["client 1 users","lisa"]          |
+---------+------------------------------------+

I'm not arguing that there's no use for recursion, just offering the simple way out for some use cases, maybe 😄

tsandall commented 5 years ago

Google recently published a related paper on one of their authorization systems: https://ai.google/research/pubs/pub48190.

charlieegan3 commented 5 years ago

Much of this is a little beyond me but I'm interested in this use case - partly for fun, but it's also one of the Datalog value adds over (most?) SQL-like query languages (as I understand it).

While Google can build a special indexing system (Leopard?) for set computations, I'm interested to know what the options (with some work) might be to get this functionality in OPA - even at the risk of poor performance.

On the 'What is Rego' page here Rego is introduced as being inspired by Datalog. Is there anywhere I can read about this and other differences with Datalog and the design decisions behind them?

harsimranb commented 4 years ago

I am evaluating OPA for a CRM like product that runs on Kubernetes. Missing recursion groups support is perhaps the only thing preventing us from using OPA as our authorization tool - to manage Rest API and cluster resources policies. Would've loved to see this supported. @tsandall Any idea on the roadmap for this feature.

tsandall commented 4 years ago

@harsimranb it's not on the short-term roadmap but it may be a feature that we work on later in the year. EDIT: whoops, accidentally clicked close.

jaspervdj commented 4 years ago

We also encountered this issue at Fugue, there are a few rules that we currently aren't able to write without custom add-hoc built-ins. I have been thinking about this for a bit after discussing it at the OPA biweekly meeting. I came up with a design that would work for us and the original problem in this issue. It is just one design -- I think it's worth exploring different alternatives and it is very much open to feedback.

We propose that rules may be self-recursive under a strict set of circumstances.

This is an example of a valid self-recursive rule:

dag = {
  "alpha": {"ref": "beta"},
  "beta": {"ref": "gamma"},
  "gamma": {"val": "Hello, world!"}
}

resolved[k] = v {
  v = dag[k].val
} {
  v = resolved[dag[k].ref]
}

The rule is a valid self-recursive rule because:

It refers to itself
It has a base case
The arguments in the recursive call are finite

The third item guarantees allows a class of rules that are a subclass of all rules that are recursive and terminating. There will be some false positives -- rules that would terminate but that are rejected nonetheless.

An example of a non-terminating rule that the finite check would rightfully reject is:

numbers[n] {
  n
} {
  numbers[m]
  n = m + 1
}

And here is an example of a terminating rule that the finite check would reject:

small_numbers[n] {
  n
} {
  small_numbers[m]
  n = m + 1
  n < 10
}

The check is based on the assumption that every rule (set-generating or object-generating) has a finite set of values.

This means that every anchored reference to another document is finite:

input.some_list[_]
data.some.package.rule[_].field[_]
rule_in_this_package[_]

Based on this, it's possible to implement a variable check similar to the safe (grounded) vars that marks variables and terms as being finite.

x = data.some.pkg[i]  # m and i are safe
y = x                 # x is safe
z = x + 1             # z is not safe
w = f(x)              # w is not safe

An importantely side effect of this "definition" which is stricter than the safe variable definition is that all terms that are finite are also grounded.

Once we performed this analysis and we know which terms are finite, we can allow recursive rules with arguments that are safe -- we know that the they will have a finite domain (the union of all the finite sets in the body, and base cases).

However, there's still one problem preventing us from guaranteeing termination. Consider the following alternative DAG:

dag = {
  "alpha": {"ref": "beta"},
  "beta": {"ref": "gamma"},
  "gamma": {"ref": "alpha"}
}

You can see that resolved would now be stuck in a cycle forever!

resolved[k] = v {
  v = dag[k].val
} {
  v = resolved[dag[k].ref]
}

We can prevent that by having an cycle check in the topdown evaluator (and WASM backend).

When we evaluate resolved, k is initially ungrounded. However, in a recursive call, k is finite and hence grounded. We create a set/object of keys we have visited. The values are either IN_PROGRESS or an actual grounded value. k is marked as in IN_PROGRESS. When we encounter another recursive call, we check that the new key is not marked as IN_PROGRESS.

If that is the case, we encountered a cycle, and we should abort evaluation (this branch returns the empty set).
If that is not the case, we continue.

As a side effect, this builds up an object that may be re-used as a cache to avoid recomputing the recursively generated items(?).

To summarize, here are the advantages of this approach:

No syntax changes
More general than transitive closure
Guarantees termination

I can also see some disadvantages:

Needs changes to both compilation as well as evaluation
There will be false negatives in the finite check
I haven't given mutually recursive rules much thought

timothyhinrichs commented 4 years ago

This reminds me of something we used to call 'safe, stratified recursion'. Arrange the rule names into a dependency graph and put them into "strata" so that any pair of recursive rule names belong to the same strata. To be safe/stratified recursion, in the body of every rule, every argument to a recursive rule invocation needs to also appear within another rule invocation from a lower stratum. That would give us semantic finiteness.

What's worrisome here is the observation that there's still the possibility of infinite cycles at evaluation time, so we don't have algorithmic finiteness/termination. Unless we add in a check. But how do we know that check suffices? When we did the work on safe, stratified recursion we also used the fact that we could order the rule body so it never recursed on anything that wasn't ground. Perhaps that's what you're saying above? That (a) we require every recursive call to be ground from a finite set and (b) never compute the same ground recursive call more than once.

It'd be great to see some of the concrete problems you're trying to address.

timothyhinrichs commented 4 years ago

Dug out the reference. Page 7 of http://logic.stanford.edu/reports/LG-2006-01.pdf

jaspervdj commented 4 years ago

Hey @timothyhinrichs; thanks for that reference! To be honest I haven't been able to spend any time on this after I wrote this up.

The idea I was proposed was loosely based on what I read about "classic" strata in Datalog. I think the main problem I had with applying it here is that stratification relies on being able to represent rules of lower-or-equal strata as external database of facts. I'm not sure if you can do with Rego (or SQL for that matter) since the primitives in either language allow you to "make up" new facts by, for example, adding numbers together.

The runtime check worries me as well and I would prefer to find a cleaner solution. I'll read up on that paper and I'll try to spend some more time in the coming weeks to see if there's better ways to do this.

For the concrete problems, I have encountered this problem in RBAC systems (very close to what the OP described, in fact), with IAM policy and most recently with terraform modules which can form an infinitely deep tree.

It's worth mentioning that all of the 3 concrete subproblems I encountered could be solved with support for some way of taking the transitive closure of an object/set. It is definitely not something we "urgently" need -- there are plenty of workarounds -- but I think this is an important problem to solve for the language in the long term so that's why I care :-)

timothyhinrichs commented 4 years ago

For that terraform example...what is the thing that invokes resolve trying to accomplish? The abstract notion of recursive substitution makes sense. It's whether there's another way to accomplish the same goal that I wonder about. Perhaps a version of 'walk()' that iterates over 2 trees at the same time?

benze commented 4 years ago

@timothyhinrichs I have a very simple case of what OPA is detecting as recursion, but technically is not b/c the arguments are bound in the fn definition. This is causing me some current headaches:

user_has_role( "Expert" ) {
    userInfo.roles[_] == "Expert"
}
#
user_has_role( "User" ) {
    userInfo.org.roles[_] == "Provider"
    not user_has_role( "Expert")
}

OPA is throwing a recursion error, claiming that there is a recursive call occuring, when clearly there is no chance of recursion due to the bound argments in the fn definition.

Moving things to helper methods is functional, but a little hackish, as someone could update the user_has_role() method without realizing there is a helper function that should be modified instead, thereby changing the unintended functionality of the delegate method.

timothyhinrichs commented 4 years ago

@benze thanks for the concrete example! That's a classic case of safe recursion; interestingly we need to look at more than just the dependency graph of rule names to detect it's safe. So it wouldn't even be allowed by 'safe, stratified recursion' that @jaspervdj and I were discussing earlier. It seems inevitable that once we support recursion we'll have a (growing) list of kinds of recursion that we support. Loop detection during evaluation should be enough to ensure termination in this case.

I wondered what it'd look like to write user_has_role today without the recursion.

user_is_expert { userInfo.roles[_] == "Expert" }
user_is_provider { userInfo.org.roles[_] == "Provider" }

user_has_role("Expert") { user_is_expert }
user_has_role("User") { user_is_provider; not user_is_expert }

I've started using helpers like this more often, especially if the logic is gnarly or I want the rules to be self-documenting.

jaspervdj commented 4 years ago

So, I had some more time to think about this and I think that stratification does help a little bit -- at least with providing a helpful formal framework for this.

The ideas of stratification generally seem to work with Rego. We can use the same restrictions around negation and recursion that classic Datalog uses; e.g. rules can be self-recursive but the recursion should not occur in a negative position.

However, the problems that I brought up unfortunately remain.

Stratified datalog on finite sets is guaranteed to terminate. However, a bunch of things in Rego are not finite (e.g. the set of strings, the set of ints...). This requires us to in addition to the stratification requirements do the finite check that I outlined in a previous comment in this thread.
I realised that the runtime cycle check is not needed when rules are evaluated in a bottom up way. This also requires a change to the evaluation package -- recursive rules need to be computed bottom-up rather than top-down; but at least it's easier to see that the evaluation terminates on finite sets.

If we look at the bottom-up algorithm for the problem I previously mentioned:
```
dag = {
  "alpha": {"ref": "beta"},
  "beta": {"ref": "gamma"},
  "gamma": {"val": "Hello, world!"}
}

resolved[k] = v {
  v = dag[k].val
} {
  v = resolved[dag[k].ref]
}
```
The bottom-up algorithm starts out with the empty set and iterates until it has built the full set/object. In this case:
- resolved = {}
- resolved = {"gamma": "Hello, World!"} (after 1 iteration)
- ```
resolved = {
    "gamma": "Hello, world!",
    "beta": "Hello, world!"
}
```
  (after 2 iterations)
- ```
resolved = {
    "gamma": "Hello, world!",
    "beta": "Hello, world!",
    "alpha": "Hello, world!"
}
```
  (after 3 iterations)
- No new results, so we've reached a fixed point.
This also works with mutually recursive rules (keeping multiple temporary sets as we calculate the fixed point), but it's hard to gauge for me how seamlessly this alternative evaluation strategy will fit into the topdown package.

timothyhinrichs commented 4 years ago

Bottom-up computation does pretty naturally handle fixed-point recursion.

One drawback is that bottom-up makes it less efficient to find a single solution--it's designed to find all solutions. For top-down we just need to add an argument to tell it to stop once its found an answer. Admittedly we don't do this today, but it's something we get requests for.

The memory profile of bottom-up and top-down are quite different. Top-down searches the space 1 branch at a time. Whereas bottom-up searches all branches simultaneously--1 level at a time from the bottom.

Printing out a trace with bottom-up is challenging.

Not sure how partial-evaluation would work with bottom-up.

More practically bottom-up is a completely different evaluation algorithm, which is a significant amount of work. It could use the same AST and hopefully the same builtins as top-down. Maybe the rule-index? But I'd expect all the run-time datastructures involved to be different, which means a completely different impl.

jaspervdj-luminal commented 4 years ago

After thinking about this a bit more, I think that we could tackle all use cases at Fugue with a new built-in function that does reachable set analysis.

I initially thought this wouldn't work because you need to agree on a format to represent the graph; and that's different for every use case, but the solution is simple -- you can use Rego to easily convert to a common outgoing edge set representation. The conversion seems elegant and idiomatic for the cases I tried.

The built-in I would like to add is:

output := graph.reachable(edges_map, initial_set)

(I am okay with dropping the graph. prefix since walk is also in the global namespace).

edges_map is an object that describes the outgoing edges for every vertex in the graph: simply put, it contains the (directional) neighbours of every vertex. If a neighbour is not a member of the edges_map itself, it is ignored.
initial_set is the initial set for reachable set search: we want to figure out what other vertices we can reach from this initial set.
output is a set of all reachable vertices following edges from initial_set.

I have two examples to make this a bit more clear.

Example 1: Org Chart

This is very similar to the example in the original issue and is related to all sort of hierarchical RBAC policies. Each entity in our silly org chart can only acces floors from sub-entities.

org_chart_data = {
  "ceo": {
    "floor": 10
  },
  "human_resources": {
    "owner": "ceo",
    "floor": 7
  },
  "staffing": {
    "owner": "human_resources",
    "floor": 6
  },
  "internships": {
    "owner": "staffing",
    "floor": 5
  }
}

Conversion to an edges_map requires us to "reverse" the owner relationship:

org_chart_edges[entity_name] = edges {
  org_chart_data[entity_name]
  edges = {neighbour | org_chart_data[neighbour].owner == entity_name}
}

This produces the following edges_map:

{
  "ceo": {"human_resources"},
  "staffing": {"internships"},
  "internships": {},
  "human_resources": {"staffing"}
}

We can then write an allow policy like this which should pass the tests below:

org_chart_allow {
  entities = graph.reachable(org_chart_edges, {input.entity})
  org_chart_data[entities[_]].floor == input.floor
}

test_org_chart_allow {
  org_chart_allow with input as {"entity": "ceo", "floor": 5}
  not org_chart_allow with input as {"entity": "human_resources", "floor": 10}
}

Example 2: Security Groups

Another example I encountered "in the wild" relates to Cloud security group resources. It's not currently possible to have a policy that relates arbitrary security groups in a graph. Imagine that we have a policy that (some) security groups need to be completely isolated from the internet.

security_groups_data = {
  "sg-a": {
    "internet_gateway": true,
    "egress": []
  },
  "sg-b": {
    "internet_gateway": false,
    "egress": ["sg-a"]
  },
  "sg-c": {
    "internet_gateway": false,
    "egress": ["sg-b"]
  },
  "sg-d": {
    "internet_gateway": false,
    "egress": []
  }
}

This is a completely different format (no edge reversal) but we can again convert it to an edges_map in an idiomatic way:

security_groups_edges[security_group_name] = egress_set {
  security_group = security_groups_data[security_group_name]
  egress_set = {egress | egress = security_group.egress[_]}
}

And then write a deny rule for anything that's not compltely isolated, followed by a test:

security_groups_deny {
  reachable_sgs = graph.reachable(security_groups_edges, {input})
  security_groups_data[reachable_sgs[_]].internet_gateway
}

test_security_groups_deny {
  security_groups_deny with input as "sg-b"
  not security_groups_deny with input as "sg-d"
}

Summary

This is not as general as some of the approaches I proposed before, but also requires a far less invasive change. I think I got kind of hung up on the idea that we needed recursion because the data and inputs can come in all sorts of shapes. But because we can generate the edges set in Rego, and then post-process the reachable set however we want, just adding a graph.reachable still feels general enough for all the use cases I can think of.

I'm happy take care of the implementation if the OPA authors think this is a good way forward.

timothyhinrichs commented 4 years ago

This seems like a good approach.

The proposed builtin is a variant of a common class of recursion: transitive-closure. Output-size is small. It's easy for people to understand. There are plenty of algorithms for computing it efficiently (if scale becomes an issue). It gives us the ability to see a wide range of real policies before looking at a deeper integration in the language.

All new builtins need prefixes for namespacing reasons so I'd say this should be graph.reachable as you propose.

On Wed, May 6, 2020 at 11:37 AM Jasper Van der Jeugt < notifications@github.com> wrote:

After thinking about this a bit more, I think that we could tackle all use cases at Fugue with a new built-in function that does reachable set analysis.

I initially thought this wouldn't work because you need to agree on a format to represent the graph; and that's different for every use case, but the solution is simple -- you can use Rego to easily convert to a common outgoing edge set representation. The conversion seems elegant and idiomatic for the cases I tried.

The built-in I would like to add is:

output := graph.reachable(edges_map, initial_set)

(I am okay with dropping the graph. prefix since walk is also in the global namespace).

edges_map is an object that describes the outgoing edges for every vertex in the graph: simply put, it contains the (directional) neighbours of every vertex. If a neighbour is not a member of the edges_map itself, it is ignored.

initial_set is the initial set for reachable set search: we want to figure out what other vertices we can reach from this initial set.

output is a set of all reachable vertices following edges from initial_set.

I have two examples to make this a bit more clear.

Example 1: Org Chart

This is very similar to the example in the original issue and is related to all sort of hierarchical RBAC policies. Each entity in our silly org chart can only acces floors from sub-entities.

org_chart_data = { "ceo": { "floor": 10 }, "human_resources": { "owner": "ceo", "floor": 7 }, "staffing": { "owner": "human_resources", "floor": 6 }, "internships": { "owner": "staffing", "floor": 5 } }

Conversion to an edges_map requires us to "reverse" the owner relationship:

org_chart_edges[entity_name] = edges { org_chart_data[entity_name] edges = {neighbour | org_chart_data[neighbour].owner == entity_name} }

This produces the following edges_map:

{ "ceo": {"human_resources"}, "staffing": {"internships"}, "internships": {}, "human_resources": {"staffing"} }

We can then write an allow policy like this which should pass the tests below:

org_chart_allow { entities = graph.reachable(org_chart_edges, {input.entity}) org_chartdata[entities[]].floor == input.floor }

test_org_chart_allow { org_chart_allow with input as {"entity": "ceo", "floor": 5} not org_chart_allow with input as {"entity": "human_resources", "floor": 10} }

Example 2: Security Groups

Another example I encountered "in the wild" relates to Cloud security group resources. It's not currently possible to have a policy that relates arbitrary security groups in a graph. Imagine that we have a policy that (some) security groups need to be completely isolated from the internet.

security_groups_data = { "sg-a": { "internet_gateway": true, "egress": [] }, "sg-b": { "internet_gateway": false, "egress": ["sg-a"] }, "sg-c": { "internet_gateway": false, "egress": ["sg-b"] }, "sg-d": { "internet_gateway": false, "egress": [] } }

This is a completely different format (no edge reversal) but we can again convert it to an edges_map in an idiomatic way:

security_groups_edges[security_group_name] = egress_set { security_group = security_groups_data[security_group_name] egress_set = {egress | egress = securitygroup.egress[]} }

And then write a deny rule for anything that's not compltely isolated, followed by a test:

security_groups_deny { reachable_sgs = graph.reachable(security_groups_edges, {input}) security_groups_data[reachablesgs[]].internet_gateway }

test_security_groups_deny { security_groups_deny with input as "sg-b" not security_groups_deny with input as "sg-d" }

Summary

This is not as general as some of the approaches I proposed before, but also requires a far less invasive change. I think I got kind of hung up on the idea that we needed recursion because the data and inputs can come in all sorts of shapes. But because we can generate the edges set in Rego, and then post-process the reachable set however we want, just adding a graph.reachable still feels general enough for all the use cases I can think of.

I'm happy take care of the implementation if the OPA authors think this is a good way forward.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/opa/issues/947#issuecomment-624819185, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMVQKYBLFAXQWS72UXUNZDRQGVAJANCNFSM4FVERTQQ .

tsandall commented 4 years ago

Closing this now that we've had the graph.reachable built-in function in place for a little while. If we find that the built-in function is too limited, we can revisit in the future.