stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Policy decision layer: production-ready Rego policy definitions v1 #499

Closed fhennig closed 2 weeks ago

fhennig commented 7 months ago

As a user of the SDP, I want to be able to manage my authorization policies in a fairly simple, maintainable and flexible way.

Current state

Currently we offer the UserInfoFetcher as well as OPA authorizers for a few products, but we do not have any guidance on how to actually write policies.

Expected outcome

Outcomes can be RegoRule templates that we recommend users to use as a starting point for their own rules, it could also be a framework or library of RegoRules that we ship with the platform. We should also have a demo that showcases this. As we are working on this, we should also gain more knowledge about how to actually write sensible rules for the products, and find out more about what common policy definitions might look like.

Step 1: Spikes, gather knowledge - plain OPA (no k8s)

We do not yet know enough about the products and their authorization models. We first want to spike some policies for each product to get a better understanding of how they all work, and then afterwards see what we can abstract away. For now, we are starting with HDFS and Trino. We also wanted to have a demo scenario that we can use as a reference when thinking about authorization and what we need to model.

What should the Rego data structures look like? We want to go in with little prerequisites and think about what works best for the product. For example for Trino we found it useful to allow the user to specify a similar data structure to file-based access control. The policies should support assigning access to individual users and groups. Users can model their organization in groups.

### Tasks
- [ ] https://github.com/stackabletech/opa-operator/pull/522
- [ ] https://github.com/stackabletech/issues/issues/500
- [ ] https://github.com/stackabletech/issues/issues/523

Questions that we should answer for each product:

For each product there is an OPA authorizer and we know the input that we get from the authorizer. Policy definitions should simply be tested in pure Rego.

UserInfoFetcher - we do not want to use the UIF yet. We can simply mock the UIF API.

### Tasks
- [ ] https://github.com/stackabletech/issues/issues/524
- [x] look at all the RegoRules. What is similar, what can be common? Can group/role defintions be abstracted away and shared? Are the authz models between products compatible? --> We moved all of this into the abstraction layer
- [x] Document this knowledge in our [internal knowledge base](https://app.nuclino.com/Stackable/Engineering/Authorization-Mechanisms-across-platform-products-210a0804-4f2c-44de-afa8-5be65553967d). This should form the basis for an abstraction
- [x] ~~ADR~~ -> not needed here, only with the abstraction layer

Intermediate Acceptance criteria

Step 2: Build a demo to showcase the rules (and other context: Kerberos, OpenID, UserInfoFetcher)

### Tasks
- [x] Update the demo to use the latest Trino rules
- [x] Implement some nice demo Trino rules
- [x] Add a Job to move the TPC-DS data into HDFS
- [x] Add some views on top of the HDFS data in Trino to make the 'ugly' parts of TPC-DS nicer (also shows how to deal with 'legacy' datasets like that)
- [x] Add nice superset dashboards
- [x] stretch goal: superset authorization (without OPA)
- [x] Add a Spark Job (https://github.com/stackabletech/issues/issues/530)
- [x] remove temporary 'hack' folder again: https://github.com/stackabletech/opa-operator/tree/main/hack - but add a similar diagram to the documentation of the demo -> https://github.com/stackabletech/opa-operator/pull/560

Step 3: Deployment on the customer side

For now, since we only have two rule sets and no abstraction layer, we want to keep the rules as something users can deploy by themselves, and not automate the deployment. We can come back to automated deployment once we build an abstraction layer.

However the rules are still great starting points for customers, so we should publish them so users can use them. We want to keep the source of truth in the kuttl tests, and link to them from the documentation. There should be some explanatory documentation around the rules as well.

### Tasks
- [ ] https://github.com/stackabletech/trino-operator/issues/580
- [ ] https://github.com/stackabletech/hdfs-operator/issues/516
- [ ] https://github.com/stackabletech/opa-operator/issues/422
- [ ] https://github.com/stackabletech/opa-operator/issues/558
- [ ] https://github.com/stackabletech/opa-operator/pull/557
- [ ] https://github.com/stackabletech/demos/pull/86
- [ ] https://github.com/stackabletech/opa-operator/issues/617

Follow-up work

### Related tasks but out of scope for now
- [ ] https://github.com/stackabletech/issues/issues/497
- [ ] Write a RegoRule set for Kafka
- [ ] https://github.com/stackabletech/hbase-operator/issues/488
fhennig commented 7 months ago

Some notes:

It's a good idea to spike the policy stuff in pure Rego. We can use policy testing to test authorizer input.

This is sample input from the Kafka authorizer ```rego { "action": { "logIfAllowed": true, "logIfDenied": true, "operation": "DESCRIBE", "resourcePattern": { "name": "alice-topic", "patternType": "LITERAL", "resourceType": "TOPIC", "unknown": false }, "resourceReferenceCount": 1 }, "requestContext": { "clientAddress": "192.168.64.1", "clientInformation": { "softwareName": "unknown", "softwareVersion": "unknown" }, "connectionId": "192.168.64.4:9092-192.168.64.1:58864-0", "header": { "data": { "clientId": "rdkafka", "correlationId": 5, "requestApiKey": 3, "requestApiVersion": 2 }, "headerVersion": 1 }, "listenerName": "SASL_PLAINTEXT", "principal": { "name": "alice-consumer", "principalType": "User" }, "securityProtocol": "SASL_PLAINTEXT" } } ```
Druid ```rego { "user": "alice", "action": "READ", "resource": { "type": "DATASOURCE", "name": "foo-table" } } ```
Trino1 ``` { "context": { "identity": { "user": "foo", "groups": ["some-group"] }, "softwareStack": { "trinoVersion": "434" } }, "action": { "operation": "SelectFromColumns", "resource": { "table": { "catalogName": "my_catalog", "schemaName": "my_schema", "tableName": "my_table", "columns": [ "column1", "column2", "column3" ] } } } } ```
Trino2 ```rego { "context": { "identity": { "user": "foo", "groups": ["some-group"] }, "softwareStack": { "trinoVersion": "434" } }, "action": { "operation": "RenameTable", "resource": { "table": { "catalogName": "my_catalog", "schemaName": "my_schema", "tableName": "my_table" } }, "targetResource": { "table": { "catalogName": "my_catalog", "schemaName": "my_schema", "tableName": "new_table_name" } } } } ```

For the multi-tenancy issue, I had the idea that we could use an allow rule in a cluster-specific rego package that then defers to a more generic package. The cluster specific package can then attach some context information about which cluster the request came from. for example:

package myDruid

import rego.v1
import data.druid

allow if {
    druid.allow with input as {
        "product": "druid",
        "cluster": {  # the name and labels are taken from the kubernetes metadata
            "name": "my-druid",
            "labels": {
                "env": "dev"
            }
        },
        "user": input.user,
        "action": {
            "resource": {
                "type": concat("", ["druid-", lower(input.resource.type)]),
                "name": input.resource.name,
            },
            "operation": lower(input.action)
        }
    }
}

The DruidCluster my-druid can then reference the myDruid OPA package, and there is also a druid package that handles all Druid auth requests from all druid clusters.

fhennig commented 7 months ago

related ticket: https://github.com/stackabletech/opa-operator/issues/494

sbernauer commented 2 weeks ago

We now have production ready rego rules for Trino and HDFS, closing this :rocket: