open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.72k stars 1.35k forks source link

Spec for compile response #2118

Open AlexGilleran opened 4 years ago

AlexGilleran commented 4 years ago

Hi OPA team - How do you feel about having some kind of spec that states exactly what can and can't come back as a response from compile for partial evaluation?

We're really loving the potential over at magda-io/magda, but it's a bit hard to have confidence in our ability to parse responses from the compile API, because everything we've written is based on reverse engineering what comes back, and I'm pretty sure we're making assumptions that won't necessarily hold up in the future - e.g. 6 months ago responses to certain policies came back with everything in support, but having upgraded to the latest OPA there's now nothing in support. This is a great step forward, but it breaks the assumptions we made previously when writing parsers, and I'd like to get it right this time around.

Or maybe there is something like this that I just haven't found? Maybe I could infer it from the source somewhere? Thanks :)

tsandall commented 4 years ago

@AlexGilleran We should put something together so that users that want to embark on consuming Compile API responses can do so without reverse engineering and trial-and-error. Can you share some detail on the use cases you have for the Compile API and partial evaluation? That would be quite helpful.

To be clear, Compile API responses (and partial evaluation which the Compile API is based on today) can contain arbitrary Rego statements. In many cases, I think consumers would not want/need to implement the entire language--they should only need a fragment. E.g., the linear-time fragment without any support rules. This fragment is relatively straightforward to translate into other languages like SQL.

AlexGilleran commented 4 years ago

Thanks for the response @tsandall !

So our use case is for Magda - it's a catalog for datasets where an organisation can list all their datasets (including storing arbitrary metadata against each dataset as JSON), and search it too.

The most ambitious use case we have is to allow administrators to write an OPA policy that uses the metadata of a dataset (potentially including that arbitrary JSON) to do authorisation. So for instance (simplifying some of the implementation details), if I as an administrator wanted to make it so that datasets could be exclusively viewed by the team the dataset is assigned to, I'd write a policy called team, set a teamId value in the JSON of each dataset, and assign the team policy to that dataset. Then when a user tries to retrieve all the datasets they have access to, our service calls OPA with something along the lines of:

// POST /compile
{
    "query": "data.object.dataset.team.read == true",
    "input": {
        "user": {
            // etc
            "teamId": "teamA"   
        }
    },
    "unknowns": ["input.object.dataset"]
}

Then we hope it comes back with something telling us to look for input.object.dataset.team == "teamA". We turn that into SQL for the purposes of getting data our of our Postgres database (we use JSONB so we can interrogate JSON), and we turn it into ElasticSearch query language for search - so we can have one policy, but it carries across multiple data stores.

The real magic (we hope) is that if the administrator changes the policy to something like "the user must be in the same team and also have a certain role", then they can simply update the OPA policy, and none of our code has to change... but this assumes that we've got a good handle on how to parse OPA partial compile responses, which brings me back to this issue :).

You can find more detailed information on what we're trying to do in these tickets, and some (still slightly confused) code for parsing here (scala) and here (TypeScript). I'm not sure any of it will make sense without a background on the project though!

We actually have this working at a proof-of-concept level right now... however a lot of the code previously depended on interpreting support, which was difficult to write and doesn't seem like the right way to go about it.

To be clear, Compile API responses (and partial evaluation which the Compile API is based on today) can contain arbitrary Rego statements. In many cases, I think consumers would not want/need to implement the entire language--they should only need a fragment. E.g., the linear-time fragment without any support rules. This fragment is relatively straightforward to translate into other languages like SQL.

Agree 100% - going from what's in queries to SQL or to ElasticSearch is great. Just need some guidance on:

Hope that makes sense, happy to clarify more if not. Really loving the OPA project, it's got ridiculous amounts of potential :).

tsandall commented 4 years ago

@AlexGilleran Thanks for the write up and the links. This is very helpful.

Exactly what to expect... e.g. can I always expect a term to be an array with 3 values, the first of which is the sign? Could the sign ever be in a different place? Could there sometimes be no sign if the test is for the presence of the value?

You're asking the right questions--we just don't have a good doc for external users to consume this right now. The two best examples are in the contrib repo (the sqlite/python example and the elastic/golang example). The ASTs are defined in Go:

The terms can either be a SINGLE term (representing an expression like 10 or ["foo"] or input.foo["bar"]) or an ARRAY of terms (representing any call expression like input.foo == "bar".)

How to write/query policies to ensure that the partial compilation response contains a nice queries object and no support.

This is area is lacking. We don't have any kind of linter that could warn authors when they "exit" the fragment supported by the tooling. One option would be to run partial eval and then whitelist/blacklist the outputs (e.g., if the output contains support then error, if the output refers to unsupported built-in functions then error.) It's not immediately obvious which way would be better. If you could codify the fragment supported by your query translation the whitelist approach would probably be best.

AlexGilleran commented 4 years ago

Thanks for that @tsandall, gives me something to go off 👍 👍

emad7105 commented 4 years ago

We are also working on data access middleware which can make use of OPA's Compile API for partial evaluation. Having a spec would be extremely helpful.

eliw00d commented 4 years ago

Using the example here I was able to get the following response:

{
  "result": {
    "queries": [
      [
        {
          "index": 0,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "string",
              "value": "alice"
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "data"
                },
                {
                  "type": "string",
                  "value": "pets"
                },
                {
                  "type": "var",
                  "value": "$02"
                },
                {
                  "type": "string",
                  "value": "owner"
                }
              ]
            }
          ]
        },
        {
          "index": 1,
          "terms": {
            "type": "ref",
            "value": [
              {
                "type": "var",
                "value": "data"
              },
              {
                "type": "string",
                "value": "pets"
              },
              {
                "type": "var",
                "value": "$02"
              }
            ]
          }
        }
      ],
      [
        {
          "index": 0,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "string",
              "value": "alice"
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "data"
                },
                {
                  "type": "string",
                  "value": "pets"
                },
                {
                  "type": "var",
                  "value": "$13"
                },
                {
                  "type": "string",
                  "value": "veterinarian"
                }
              ]
            }
          ]
        },
        {
          "index": 1,
          "terms": [
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "eq"
                }
              ]
            },
            {
              "type": "string",
              "value": "SOMA"
            },
            {
              "type": "ref",
              "value": [
                {
                  "type": "var",
                  "value": "data"
                },
                {
                  "type": "string",
                  "value": "pets"
                },
                {
                  "type": "var",
                  "value": "$13"
                },
                {
                  "type": "string",
                  "value": "clinic"
                }
              ]
            }
          ]
        },
        {
          "index": 2,
          "terms": {
            "type": "ref",
            "value": [
              {
                "type": "var",
                "value": "data"
              },
              {
                "type": "string",
                "value": "pets"
              },
              {
                "type": "var",
                "value": "$13"
              }
            ]
          }
        }
      ]
    ]
  }
}

What do $02, $13, etc. mean for values? Should they just be filtered out?

I am trying to build something similar to the data filter example in Node.js and MongoDB. We would want to send a request to OPA to evalute whether or not a user has permission to access a list of resources. With the response we would construct a MongoDB query. Documentation would be great but any suggestions on how to do this?

ashutosh-narkar commented 4 years ago

Check this for an OPA-MongoDB data filtering example that leverages partial eval to translate Rego to MongoDb queries. @VineethReddy02 worked on that integration.

stale[bot] commented 2 years ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.