Closed aeneasr closed 6 years ago
@aeneasr thanks for the thorough write up. I appreciate you taking the time to share your thoughts.
I've split my response into three sections to cover the points you raise.
_ = 3
versus 3 = _
)The REPL does not make it explicit when a rule is defined (versus when a query is evaluated.) I think some of the confusion may stem from this. The first time you run a = 3
, the REPL defines a rule. You can see the rule by typing show
. The next time you run a = 4
the REPL evaluates this as a query. This gets into the shadowing issue which is covered below.
Perhaps it would be less confusing if the REPL printed a message to indicate that a rule was just defined. For example:
> a = 3
Rule 'a' defined in package 'repl'. Type 'show' to see rules.
> a = 4
undefined
> a := 4
Rule 'a' re-defined in package 'repl'. Type 'show' to see rules.
> a = 4
true
What do you think?
One thing I'll note is that when you write Rego outside of the REPL (i.e., in a source file) it's much clearer when rules are being defined.
This is the more important issue in my mind.
Variables in =
expressions do not shadow globals. The global value is captured and this can lead to issues. In practice we found this becomes an issue when you have very large policies (e.g., several hundred lines) within a single package.
To address these issues we introduced the assignment :=
and double equal ==
operators earlier this year. These behave like you would expect.
:=
assigns variables on the left hand side. The variables will shadow globals. If a variable with the same name in the same scope was defined previously, this is an error.
==
compares two values but DOES NOT bind variables like =
.
By using :=
and ==
the problems with =
go away (mostly).
For example, in the hostnames
example, using :=
prevents the subsequent name
from being captured:
> hostnames[name] { name := sites[_].servers[_].hostname }
> hostnames[x]
+-------------+--------------+
| x | hostnames[x] |
+-------------+--------------+
| "hydrogen" | "hydrogen" |
| "helium" | "helium" |
| "lithium" | "lithium" |
| "beryllium" | "beryllium" |
| "boron" | "boron" |
| "carbon" | "carbon" |
| "nitrogen" | "nitrogen" |
| "oxygen" | "oxygen" |
+-------------+--------------+
> name = "oxygen"
> hostnames[x]
+-------------+--------------+
| x | hostnames[x] |
+-------------+--------------+
| "hydrogen" | "hydrogen" |
| "helium" | "helium" |
| "lithium" | "lithium" |
| "beryllium" | "beryllium" |
| "boron" | "boron" |
| "carbon" | "carbon" |
| "nitrogen" | "nitrogen" |
| "oxygen" | "oxygen" |
+-------------+--------------+
For reference here are the other two examples rewritten to use ==
and :=
.
The regions example:
# Aggregate names of sites in the "west" region.
region := "west"; site_names := [name | region == sites[i].region; name := sites[i].name]
The app/hostnames example:
# Generates a mapping of app name to hostnames of servers where the
# app is running. For example: {"web": ["host1", "host2"], "db": ["host3"]}.
#
# The equivalent imperative code to get all app -> hostnames mappings be
# something like:
#
# def get_app_hostname_mapping():
# result = {}
# for app in apps:
# hostnames = []
# for server_name in app.servers:
# for site in sites:
# for server in site.servers:
# if server.name == server_name:
# hostnames.append(server.hostname)
# result[app.name] = hostnames
# return result
#
# Note: to support queries for all mappings OR a single mapping you would have
# to either implement separate functions or overload one implementation with
# some kind of filter predicate. On the other hand, with Rego, you get both for
# free.
app_to_hostnames[app.name] = hostnames {
app := apps[_]
hostnames := [hostname | server_name := app.servers[_]
server := sites[_].servers[_]
server.name == server_name
hostname := server.hostname]
}
_ = 3
versus 3 = _
)Within a query _ = 3
and 3 = _
are equivalent. As I mentioned earlier though, the REPL will try to define a rule for you in the _ = 3
case. Since they are equivalent in queries, allowed
and allowed2
produce the same answer.
Generally speaking =
behaves declaratively–you assert that A must equal B and OPA figures how to make the happen. On the other hand, :=
and ==
are more imperative. I think we should do a better job of documenting and promoting :=
and ==
.
I think making rule definitions more explicit in the REPL would help and is relatively low-hanging fruit.
More importantly, we should refresh the documentation to focus on :=
and ==
so that people do not run into the global capturing problem. We could also consider adding something akin to pragmas or linting to prevent or warn on =
usage.
/cc @timothyhinrichs in case you have other thoughts.
Hej @tsandall , thank you for taking my input seriously and for the time you took in answering my questions/concerns.
For me, personally, the ambiguity of =
is an issue. While it is logical once you understand the mechanics behind it, I find it unintuitive as its meaning depends on the context it runs in. Maybe I can make this a bit clearer with the following rego source (please excuse the pseudo code, still adopting to rego :) ):
foo {
# This is an assignment
a = 3
# This is a type of equality check. But it depends on the context it runs in. If we omit the first one, it's an assignment. The meaning of `=` shifts with the surroundings it has.
a = 3
}
I really appreciate that :=
and ==
have been introduced. I can imagine that =
lead to some quirky behavior in large source files. For me personally, it has been much easier to write/read/understand rego code when completely avoiding =
and only using :=
and ==
.
While I do not have to maintain this project, and I am not the one responsible for potential complains you'll get when you do this - but I would really consider to change how =
works. Either have it as assignment only, or as equality only. Not both, depending on context. You're still pre 1.0
, I would certainly forgive you :)
I think the warning message could improve how REPL is used/understood, but it still requires knowledge of why it is that way and it doesn't change how the code behaves. Since I played around with OPA for some time, I certainly know now why this happens - but I don't feel that the warning message would have changed the initial "what is going on here" moment.
Personally, I would appreciate it if the documentation made primarily use of :=
and ==
and either avoided =
or gave some examples where its use is useful and reduces code complexity and/or readability.
Also thank you for explaining the example, the last one is now much clearer to me, as :=
and ==
make all the difference - I no longer have to keep an index in my head of which variables are declared when in order to understand what's an assignment and what's an assertion!
With regards to the host name example hostnames[name] { name := sites[_].servers[_].hostname }
, this should be the preferred way of defining it!
Assuming a 100 LoC file:
hostnames[name] { name = sites[_].servers[_].hostname }
// some other definitions and whatever
Now, a developer adds another code line and by accident redefines name
at the end of the file - for some reason this isn't caught by test/CI (because dev lazy and no tests, definitions bad, I can name a thousand reasons why this could happen - usually "least effort"). This will seriously tamper with the authZ code. In my mind, this should not be possible. Instead, the language should be super strict, like we're used from Go. Something's phishy? Throw a compile error. The dev will definitively notice that one!
I think that security tools should make sources of error extremely hard, maybe impossible! What's the downside of having 2 LOC instead of 1 if you could prevent the next data leak or a serious production downtime/issue?
I'm glad we're on the same page! I've filed #952 to track work for this. It's been on my mind for a while. I'd like to prioritize that and the #372 work as well.
I'm going to close this now as I think all the questions have been answered.
Ok cool. Just to confirm, =
will not change in the foreseeable future then?
That's correct -- maybe at a later date we can revisit =
but for now it will remain the same.
Ok, thank you for the clarification!
Hi there. First of all, I really appreciate this product being open sourced and all the work, money and effort authors, maintainers, contributors and others have put into this project. This issue is a summary of my findings. They represent my opinion and are based on my experience from authoring and maintaining several OSS products related to security. This is a summary from my experiments with integrating OPA in the ORY stack, you can find the complete discussion here: https://github.com/ory/keto/issues/47
I think OPA has some very good design decisions, like testing policies and a go-ish query language. But I also think that there are some serious weaknesses. Once one gets used to them, it's a bit easier to work around them but that should not be the case for a product that has a central role in the access control taxonomy.
For me, the most confusing design decision is how the equality operator works:
Let's look at some examples:
I find this very unintuitive and can imagine it being a serious cause for bugs in larger definitions. Rego is the first experience I had with DataLog (dialects) so this hit me really hard. While it is quite common that
if
can be used with equality (if (a = 3)
), this will usually evaluate to true if the assignment succeeded and fail (usually with an exception) if it failed. I think that Go solves this perfectly asresults in
prog.go:9:11: syntax error: a = 3 used as value
.Deciding what the equality sign does based on context is - IMO - a serious cause of trouble. Let's look at this with a bit more detail, specifically in the context of global variables:
Let's take the example dataset from the tutorials:
and let's define a rule:
So far so good. But what happens when we define
name
?We shadow a variable defined in a previous statement and by doing so modify its behavior. This was very unexpected to me and the cause for some serious confusion.
I found it equally confusing that
hostnames[name] { sites[_].servers[_].hostname = name }
andhostnames[name] { name = sites[_].servers[_].hostname }
yield the same result. For most languages I know, the assignee is on the left and the value on the right. This is the case when using explicit assignmenta := "foo"
where it's not possible to do"foo" := b
. This manifests when modifying the rule definition from above:There are other examples from the docs which I think are harder to read than they should be:
sites[i].region = region
is an assertion (becauseregion = "west"
) whilesites[i].name = name
is an assignment (forname
). The programmer must know the exact context of the whole program in order to decipher this. That should not be the case. For me, I really prefer easy to read and easy to "parse" code. For a language that aims at simplifying access control, I see this as a strange decision and maybe even the source of issues which may cause serious damage.Another source of confusion is
_
. Can you tell me what the difference between3 = _
and_ = 3
is? The parser at least sees a difference (the second statement evaluates to nothing):If I define this as a rule, both evaluate to true:
Maybe I am completely off with my opinion here. That's fine for me and I have no issue with using a tool that's more aligned with my opinion. But I really think that you're onto something, and I enjoy the general direction OPA is going (alongside with being a CNCF project). But for me at least, there are some serious issues with Rego that need to be addressed before I would consider exposing developers with less experience or non-devs to it.
One last thing, I really tried to understand what's going on here
but I just can't. Can someone who knows Rego better than me maybe rewrite this with
==
and:=
or elaborate a bit?