open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.68k stars 1.34k forks source link

Assignments, equality, matching - confusing patterns with dangerous design decisions #950

Closed aeneasr closed 6 years ago

aeneasr commented 6 years ago

Hi there. First of all, I really appreciate this product being open sourced and all the work, money and effort authors, maintainers, contributors and others have put into this project. This issue is a summary of my findings. They represent my opinion and are based on my experience from authoring and maintaining several OSS products related to security. This is a summary from my experiments with integrating OPA in the ORY stack, you can find the complete discussion here: https://github.com/ory/keto/issues/47

I think OPA has some very good design decisions, like testing policies and a go-ish query language. But I also think that there are some serious weaknesses. Once one gets used to them, it's a bit easier to work around them but that should not be the case for a product that has a central role in the access control taxonomy.

For me, the most confusing design decision is how the equality operator works:

The equality operator (=) is used to define expressions that assert that two values are the same. If the expression is defined in terms of one or more variables then the expression will evaluate to true if one of the variables is unbound. If the neither operand is an unbound variable, the expression is evaluated by comparing the values referenced by the operands.

Let's look at some examples:

> a = 3 # This is an assignment
> a
3
> a = 4 # This is an assertion which looks for a value in a which is 4 (none is found -> undefined)
undefined
> a = 3 # This is an assertion looking for value 3 in a which is defined -> true
true
> a == 3 # This is an equality check
true
> a # a is still 3
3

I find this very unintuitive and can imagine it being a serious cause for bugs in larger definitions. Rego is the first experience I had with DataLog (dialects) so this hit me really hard. While it is quite common that if can be used with equality (if (a = 3)), this will usually evaluate to true if the assignment succeeded and fail (usually with an exception) if it failed. I think that Go solves this perfectly as

package main

import (
    "fmt"
)

func main() {
    a := 2
    if a = 3 {
        fmt.Print("asdf")
    }
}

results in prog.go:9:11: syntax error: a = 3 used as value.

Deciding what the equality sign does based on context is - IMO - a serious cause of trouble. Let's look at this with a bit more detail, specifically in the context of global variables:

OPA attempts to bind variables to values when it encounters unbound variables in equality expressions. Binding a variable affects subsequent evaluation of expressions such that the variable will be treated as a constant (with the bound value) instead of a variable.

Let's take the example dataset from the tutorials:

sites = [
    {
        "region": "east",
        "name": "prod",
        "servers": [
            {
                "name": "web-0",
                "hostname": "hydrogen"
            },
            {
                "name": "web-1",
                "hostname": "helium"
            },
            {
                "name": "db-0",
                "hostname": "lithium"
            }
        ]
    },
    {
        "region": "west",
        "name": "smoke",
        "servers": [
            {
                "name": "web-1000",
                "hostname": "beryllium"
            },
            {
                "name": "web-1001",
                "hostname": "boron"
            },
            {
                "name": "db-1000",
                "hostname": "carbon"
            }
        ]
    },
    {
        "region": "west",
        "name": "dev",
        "servers": [
            {
                "name": "web-dev",
                "hostname": "nitrogen"
            },
            {
                "name": "db-dev",
                "hostname": "oxygen"
            }
        ]
    }
]

and let's define a rule:

> hostnames[name] { sites[_].servers[_].hostname = name }
> hostnames[x]
+-------------+--------------+
|      x      | hostnames[x] |
+-------------+--------------+
| "hydrogen"  | "hydrogen"   |
| "helium"    | "helium"     |
| "lithium"   | "lithium"    |
| "beryllium" | "beryllium"  |
| "boron"     | "boron"      |
| "carbon"    | "carbon"     |
| "nitrogen"  | "nitrogen"   |
| "oxygen"    | "oxygen"     |
+-------------+--------------+

So far so good. But what happens when we define name?

> name = "oxygen"
> hostnames[x]
+----------+--------------+
|    x     | hostnames[x] |
+----------+--------------+
| "oxygen" | "oxygen"     |
+----------+--------------+

We shadow a variable defined in a previous statement and by doing so modify its behavior. This was very unexpected to me and the cause for some serious confusion.

I found it equally confusing that hostnames[name] { sites[_].servers[_].hostname = name } and hostnames[name] { name = sites[_].servers[_].hostname } yield the same result. For most languages I know, the assignee is on the left and the value on the right. This is the case when using explicit assignment a := "foo" where it's not possible to do "foo" := b. This manifests when modifying the rule definition from above:

> hostnames[name] { sites[_].servers[_].hostname := name }
1 error occurred: 1:19: rego_compile_error: cannot assign to ref

There are other examples from the docs which I think are harder to read than they should be:

> region = "west"; names = [name | sites[i].region = region; sites[i].name = name]

sites[i].region = region is an assertion (because region = "west") while sites[i].name = name is an assignment (for name). The programmer must know the exact context of the whole program in order to decipher this. That should not be the case. For me, I really prefer easy to read and easy to "parse" code. For a language that aims at simplifying access control, I see this as a strange decision and maybe even the source of issues which may cause serious damage.

Another source of confusion is _. Can you tell me what the difference between 3 = _ and _ = 3 is? The parser at least sees a difference (the second statement evaluates to nothing):

> 3 = _
true
> _ = 3
>

If I define this as a rule, both evaluate to true:

> allowed { _=3 }
> allowed
true
> allowed2 {3=_}
> allowed2
true

Maybe I am completely off with my opinion here. That's fine for me and I have no issue with using a tool that's more aligned with my opinion. But I really think that you're onto something, and I enjoy the general direction OPA is going (alongside with being a CNCF project). But for me at least, there are some serious issues with Rego that need to be addressed before I would consider exposing developers with less experience or non-devs to it.

One last thing, I really tried to understand what's going on here

app_to_hostnames[app_name] = hostnames {
    apps[_] = app
    app_name = app.name
    hostnames = [hostname | name = app.servers[_]
                            sites[_].servers[_] = s
                            s.name = name
                            hostname = s.hostname]
}

but I just can't. Can someone who knows Rego better than me maybe rewrite this with == and := or elaborate a bit?

tsandall commented 6 years ago

@aeneasr thanks for the thorough write up. I appreciate you taking the time to share your thoughts.

I've split my response into three sections to cover the points you raise.

  1. REPL behaviour
  2. Variable shadowing
  3. Equality (e.g., _ = 3 versus 3 = _)

REPL behaviour

The REPL does not make it explicit when a rule is defined (versus when a query is evaluated.) I think some of the confusion may stem from this. The first time you run a = 3, the REPL defines a rule. You can see the rule by typing show. The next time you run a = 4 the REPL evaluates this as a query. This gets into the shadowing issue which is covered below.

Perhaps it would be less confusing if the REPL printed a message to indicate that a rule was just defined. For example:

> a = 3
Rule 'a' defined in package 'repl'. Type 'show' to see rules.
> a = 4
undefined
> a := 4
Rule 'a' re-defined in package 'repl'. Type 'show' to see rules.
> a = 4
true

What do you think?

One thing I'll note is that when you write Rego outside of the REPL (i.e., in a source file) it's much clearer when rules are being defined.

(Lack of) Variable shadowing

This is the more important issue in my mind.

Variables in = expressions do not shadow globals. The global value is captured and this can lead to issues. In practice we found this becomes an issue when you have very large policies (e.g., several hundred lines) within a single package.

To address these issues we introduced the assignment := and double equal == operators earlier this year. These behave like you would expect.

By using := and == the problems with = go away (mostly).

For example, in the hostnames example, using := prevents the subsequent name from being captured:

> hostnames[name] { name := sites[_].servers[_].hostname }
> hostnames[x]
+-------------+--------------+
|      x      | hostnames[x] |
+-------------+--------------+
| "hydrogen"  | "hydrogen"   |
| "helium"    | "helium"     |
| "lithium"   | "lithium"    |
| "beryllium" | "beryllium"  |
| "boron"     | "boron"      |
| "carbon"    | "carbon"     |
| "nitrogen"  | "nitrogen"   |
| "oxygen"    | "oxygen"     |
+-------------+--------------+
> name = "oxygen"
> hostnames[x]
+-------------+--------------+
|      x      | hostnames[x] |
+-------------+--------------+
| "hydrogen"  | "hydrogen"   |
| "helium"    | "helium"     |
| "lithium"   | "lithium"    |
| "beryllium" | "beryllium"  |
| "boron"     | "boron"      |
| "carbon"    | "carbon"     |
| "nitrogen"  | "nitrogen"   |
| "oxygen"    | "oxygen"     |
+-------------+--------------+

For reference here are the other two examples rewritten to use == and :=.

The regions example:

# Aggregate names of sites in the "west" region.
region := "west"; site_names := [name | region == sites[i].region; name := sites[i].name]

The app/hostnames example:

# Generates a mapping of app name to hostnames of servers where the
# app is running. For example: {"web": ["host1", "host2"], "db": ["host3"]}.
#
# The equivalent imperative code to get all app -> hostnames mappings be
# something like:
#
# def get_app_hostname_mapping():
#   result = {}
#   for app in apps:
#     hostnames = []
#     for server_name in app.servers:
#       for site in sites:
#         for server in site.servers:
#           if server.name == server_name:
#              hostnames.append(server.hostname)
#     result[app.name] = hostnames
#   return result
#
# Note: to support queries for all mappings OR a single mapping you would have
# to either implement separate functions or overload one implementation with
# some kind of filter predicate. On the other hand, with Rego, you get both for
# free.
app_to_hostnames[app.name] = hostnames {
    app := apps[_]
    hostnames := [hostname | server_name := app.servers[_]
                             server := sites[_].servers[_]
                             server.name == server_name
                             hostname := server.hostname]
}

Equality (e.g., _ = 3 versus 3 = _)

Within a query _ = 3 and 3 = _ are equivalent. As I mentioned earlier though, the REPL will try to define a rule for you in the _ = 3 case. Since they are equivalent in queries, allowed and allowed2 produce the same answer.

Generally speaking = behaves declaratively–you assert that A must equal B and OPA figures how to make the happen. On the other hand, := and == are more imperative. I think we should do a better job of documenting and promoting := and ==.

Summary

I think making rule definitions more explicit in the REPL would help and is relatively low-hanging fruit.

More importantly, we should refresh the documentation to focus on := and == so that people do not run into the global capturing problem. We could also consider adding something akin to pragmas or linting to prevent or warn on = usage.

tsandall commented 6 years ago

/cc @timothyhinrichs in case you have other thoughts.

aeneasr commented 6 years ago

Hej @tsandall , thank you for taking my input seriously and for the time you took in answering my questions/concerns.

For me, personally, the ambiguity of = is an issue. While it is logical once you understand the mechanics behind it, I find it unintuitive as its meaning depends on the context it runs in. Maybe I can make this a bit clearer with the following rego source (please excuse the pseudo code, still adopting to rego :) ):

foo {
  # This is an assignment
  a = 3

  # This is a type of equality check. But it depends on the context it runs in. If we omit the first one, it's an assignment. The meaning of `=` shifts with the surroundings it has.
  a = 3
}

I really appreciate that := and == have been introduced. I can imagine that = lead to some quirky behavior in large source files. For me personally, it has been much easier to write/read/understand rego code when completely avoiding = and only using := and ==.

While I do not have to maintain this project, and I am not the one responsible for potential complains you'll get when you do this - but I would really consider to change how = works. Either have it as assignment only, or as equality only. Not both, depending on context. You're still pre 1.0, I would certainly forgive you :)

I think the warning message could improve how REPL is used/understood, but it still requires knowledge of why it is that way and it doesn't change how the code behaves. Since I played around with OPA for some time, I certainly know now why this happens - but I don't feel that the warning message would have changed the initial "what is going on here" moment.

Personally, I would appreciate it if the documentation made primarily use of := and == and either avoided = or gave some examples where its use is useful and reduces code complexity and/or readability.

Also thank you for explaining the example, the last one is now much clearer to me, as := and == make all the difference - I no longer have to keep an index in my head of which variables are declared when in order to understand what's an assignment and what's an assertion!

With regards to the host name example hostnames[name] { name := sites[_].servers[_].hostname }, this should be the preferred way of defining it!

Assuming a 100 LoC file:

hostnames[name] { name = sites[_].servers[_].hostname }

// some other definitions and whatever

Now, a developer adds another code line and by accident redefines name at the end of the file - for some reason this isn't caught by test/CI (because dev lazy and no tests, definitions bad, I can name a thousand reasons why this could happen - usually "least effort"). This will seriously tamper with the authZ code. In my mind, this should not be possible. Instead, the language should be super strict, like we're used from Go. Something's phishy? Throw a compile error. The dev will definitively notice that one!

I think that security tools should make sources of error extremely hard, maybe impossible! What's the downside of having 2 LOC instead of 1 if you could prevent the next data leak or a serious production downtime/issue?

tsandall commented 6 years ago

I'm glad we're on the same page! I've filed #952 to track work for this. It's been on my mind for a while. I'd like to prioritize that and the #372 work as well.

I'm going to close this now as I think all the questions have been answered.

aeneasr commented 6 years ago

Ok cool. Just to confirm, = will not change in the foreseeable future then?

tsandall commented 6 years ago

That's correct -- maybe at a later date we can revisit = but for now it will remain the same.

aeneasr commented 6 years ago

Ok, thank you for the clarification!