nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

Implement a filtering proxy for controlling access to Prometheus metrics #493

Closed larsks closed 6 months ago

larsks commented 8 months ago

A core goal of our observability work is to be able to expose logs and metrics for research purposes. In order to accomplish this, we need a mechanism for controlling access to logs and metrics such that we can grant access to some data without granting access to all data.

@computate has already started looking at this in the context of #453. We had a conversation earlier today and talked about a couple of different ways of implementing access control:

  1. Query filtering

    In this model, we filter the query provided by the user, for example by enforcing a specific label on queries. Existing projects that follow this model include prom-label-proxy and prometheus-filter-proxy.

  2. Response filtering

    In this model, we permit all queries, but then we filter the response to include only metrics that include specific labels. In some ways this is simpler than query filtering, because it doesn't require parsing or modifying the query, and the format of the return data is relatively straightforward.

I suggested to @computate that option 2 might be the better option, but given that all the prior art seems to favor option 1 I am rethinking my position :).

Neither of the above solutions implements any sort of authorization. In order to provide that, we could place an authorization in proxy in front of the service -- something like kube-rbac-proxy or perhaps prom-authzed-proxy.

Our ultimate goal is to achieve something like the LBAC (label based access control) feature available in Grafana Enterprise.

schwesig commented 7 months ago

/cc @schwesig

schwesig commented 7 months ago

/cc @harshil-codes

computate commented 7 months ago

I have a working implementation of a Keycloak Authorization service where permissions can be granted. Access for a user, group, or Client ID and Client Secret can be evaluated in a openid-connect token request and evaluate access to both OCP clusters and namespaces at the same time. I have been testing this Keycloak Fine-Grained Authorization Resource Permissions feature in a branch that I can contribute to nerc-ocp-config.

We start by configuring the CLIENT_ID, CLIENT_SECRET, AUTH_BASE_URL, AUTH_REALM, and a fresh ACCESS_TOKEN in environment variables in the terminal.

CLIENT_ID=ai4cloudops
CLIENT_SECRET=...
AUTH_BASE_URL=https://keycloak.apps-crc.testing
AUTH_REALM=NERC
ACCESS_TOKEN=$(curl -k -X POST -u "$CLIENT_ID:$CLIENT_SECRET" -d "grant_type=client_credentials" \
  "$AUTH_BASE_URL/realms/$AUTH_REALM/protocol/openid-connect/token" | jq -r .access_token)

Then we query the openid-connect-token endpoint to request permissions for the ai4cloudops client to access all namespaces on the nerc-ocp-prod cluster.

curl -k -s -X POST -H "Authorization: Bearer $ACCESS_TOKEN" \
  "$AUTH_BASE_URL/realms/$AUTH_REALM/protocol/openid-connect/token" \
  -d "grant_type=urn:ietf:params:oauth:grant-type:uma-ticket" \
  -d "audience=$CLIENT_ID" \
  -d "response_mode=permissions" \
  -d "permission=cluster#nerc-ocp-prod" \
  -d "permission=namespace#all namespaces"

The response shows that the ai4cloudops client has access to all namespaces only on the nerc-ocp-prod cluster.

[
  {
    "scopes": [
      "nerc-ocp-prod"
    ],
    "rsid": "38b9062e-653d-4d26-81cb-89679b4252a0",
    "rsname": "cluster"
  },
  {
    "scopes": [
      "all namespaces"
    ],
    "rsid": "67543e63-5545-4448-b9f7-e1aed228c544",
    "rsname": "namespace"
  }
]

If the ai4cloudops client tries to access the nerc-ocp-infra cluster, we will get no scopes for cluster, which means that the ai4cloudops client does not have cluster access to nerc-ocp-infra.

curl -k -s -X POST -H "Authorization: Bearer $ACCESS_TOKEN" \
  "$AUTH_BASE_URL/realms/$AUTH_REALM/protocol/openid-connect/token" \
  -d "grant_type=urn:ietf:params:oauth:grant-type:uma-ticket" \
  -d "audience=$CLIENT_ID" \
  -d "response_mode=permissions" \
  -d "permission=cluster#nerc-ocp-infra" \
  -d "permission=namespace#all namespaces"
[
  {
    "scopes": [
      "all namespaces"
    ],
    "rsid": "67543e63-5545-4448-b9f7-e1aed228c544",
    "rsname": "namespace"
  }
]
computate commented 7 months ago

Most of the Keycloak configuration of Authorization realms, clients, resources, scopes, and policies can be handled with the KeycloakRealmImport CRD with GitOps, but the last and final part called "permissions" is not available in the CRD anywhere. So we will need to create a Job that can be run with the Keycloak Admin password Secret, and create these permissions through the Keycloak REST API instead.

image

larsks commented 7 months ago

So we will need to create a Job that can be run with the Keycloak Admin password Secret

@computate would it make sense to instead create a "keycloak permissions operator"? It would watch a ConfigMap/Secret (or a CRD) and update the permissions when the config changes.

computate commented 7 months ago

I like your thinking @larsks , since I am already setting up ansible playbooks to apply permissions, I want to create additional Ansible tasks to deploy the Authorization resources, scopes, and policies. The annoying thing about the Red Hat Build of Keycloak Operator is that once you create a Realm and client and authorization data with the KeycloakRealmImport CRD, if you make changes to it, they are completely ignored, which doesn't make much sense for a CRD. Without much more work, which I was planning to do anyway, we can add the additional Ansible, and also turn it into a separate Ansible operator.

computate commented 6 months ago

@larsks @schwesig I've created a new keycloak-permissions-operator with a new KeycloakAuthorization CRD that supports insert/update for resources, scopes, policies, and permissions. It also queries each of these objects 100 at a time, and deletes them if they are not configured in the CRD. I also have a working prom-keycloak-proxy. I plan to work on documentation and deployment of these services to the obs cluster. We can also review this anytime.

joachimweyl commented 5 months ago

@computate does this close out the researcher access epic or are there more steps needed for this epic to close?