Open philipaconrad opened 1 year ago
Defining schema for a data.*
attribute on the package level, or possible even sub-package level, works pretty well for this purpose today already:
# METADATA
# schemas:
# - data.policy.decoded: schema.claims
package policy
decoded := json.unmarshal(`{"exp": 1, "nbf: 2"}`)
allow {
decoded.exz == 1
}
The above will fail with:
❯ opa eval -f pretty --schema schemas -d policy.rego data.policy.allow
1 error occurred: policy.rego:9: rego_type_error: undefined ref: data.policy.decoded.exz
data.policy.decoded.exz
^
have: "exz"
want (one of): ["exp" "nbf"]
So far, so good 🙂 I think the problem with the approach above, and one that we'll need to consider for defining schemas for outputs, is that of schema distribution. If the onus of importing schemas is still on the "end user" of a policy and not on the library author, it matters less if the schema is defined on an "output" or an "input" — the effort and outcome will be pretty much the same either way. Sure, I might not have to annotate my rules for type-checking data.foo.bar
that I got from some library, but I'd still need to find the schemas for that library out-of-band, and put them in the folder designated for schemas in my project, if such a folder exists. In order to have schemas defined on outputs in library code automatically enhance the type checking experience of consumer policy, we'd need better ways of making schemas discoverable, and ideally automatically loaded (possibly with some way of opting-out).
The distribution/load mechanism allowed today is file system-based only. URL-based distribution as suggested in https://github.com/open-policy-agent/opa/issues/4478 would be interesting with schema-dfined "outputs" in mind, as a library author could simply point out where to find their schemas, and then consumers of that library could have it retrieved automatically.
This would be helpful for us, so I'm eagerly watching this.
We're finding that, in an auth-z context, with developers who are newish to Rego, the risk of coding errors compounded by the potential for being surprised at what happens when undefined
is encountered is a significant concern. Obviously an incorrect auth-z decision is a serious problem.
Doubling down on JSON schema is one way to introduce and expand type checking. I'm curious: have other approaches to strong typing been considered? For example, has the possibility of a gradual typing system with first class Rego syntax been considered?
I'd say the type system of Rego is pretty strong already, even to the point that people have complained about it 😅 The problem JSON schema solves is really how to type data, whether provided as input
or as data
from external sources. Perhaps it would be possible to do this in-policy, but since the data is defined outside of policy, it kinda makes sense to me that the type definitions are too. But ideas are certainly welcome 😃
The problem JSON schema solves is really how to type data, whether provided as
input
or asdata
from external sources.
Agree. That's the area my comment is focused on. It's the dynamic typing default around input
and data
that introduces risk of incorrect auth-z decisions.
Perhaps it would be possible to do this in-policy, but since the data is defined outside of policy, it kinda makes sense to me that the type definitions are too.
Here's an argument for doing it in-policy: The policy was written with certain preconditions in mind. So IMHO a contract-oriented mindset makes sense here. It would be very helpful if the policy could be defensive, by saying, "I was written with these assumptions about input
and data
. You have not adhered to those assumptions. Therefore I refuse to render a decision."
JSON schema plus decision-time schema validation almost gets there. But there's extra work to require that the JSON schema be opinionated enough (because for example a JSON schema could simply say input
is an object, and nothing else, which wouldn't apply type checking to enough surface area).
So that line of reasoning leads to the desire to have"strong typing" around all elements of input
and data
that the policy refers to.
The ultimate goal is soundness around allow
decisions in the presence of surprises/errors. Right now, there are many ways for an allow
decision to accidentally be true, due to typos in input
, unexpected nulls in input
(due to mistakes in marshaling and the like), typos in references to data
, etc.
JSON schema plus decision-time schema validation almost gets there. But there's extra work to require that the JSON schema be opinionated enough (because for example a JSON schema could simply say input is an object, and nothing else, which wouldn't apply type checking to enough surface area).
I'm not sure I understand how this would be any different using gradual typing system expressed in-policy. Why would you not be able to use that to say "input
is an object"?
There has been some discussions in the past to allow the use of annotations to loosen or tighten the type checks for whatever code is annotated, so one thing we could do it is allow "in-line" JSON schema definitions, which would probably be a bit closer to what you want.
I'm not sure I understand how this would be any different using gradual typing system expressed in-policy. Why would you not be able to use that to say "input is an object"?
Agreed, both gradual typing and json schema share that characteristic.
Was really just curious about the history of thought on adding type "hints" to Rego.
The main features needed to approach the goal I mentioned (strong typing around input
and data
) are:
input
and data
)(This has probably strayed far enough from @philipaconrad 's thread that the conversation should move elsewhere ... apologies if I hijacked!)
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
What is the underlying problem you're trying to solve?
Currently, the schemas/annotations feature in Rego can be applied to document values at
package
,document
,rule
, andsubpackages
scopes.However, the
rule
scope currently only applies to input values to a rule-- there is no way to handle enforcing schemas over the outputs of a rule.Describe the ideal solution
I propose extending the
rule
scope / annotation system a bit further, to allow defining schema constraints for both:This would be done by requiring fully-qualified names for each rule-head variable and/or rule name, and would look roughly like the following examples:
Partial Object definition:
Rule output value:
Describe a "Good Enough" solution
Supporting schemas on the entire rule's output might be sufficient for many use cases, since it would be checking the final result of the rule at runtime, and as a result is a superset of applying schemas to the rule-head variables.
Alternative Approaches Considered
Several other approaches could be considered for implementing this functionality, but most of them have issues, as detailed below:
rule-output
$this.x
Additional Context
Possibly relevant JSON Schema issues:
4478
3796
3476
3471
3406
3341
1449
Less relevant, but maybe related issues:
2132
4258