open-policy-agent / gatekeeper

šŸŠ Gatekeeper - Policy Controller for Kubernetes
https://open-policy-agent.github.io/gatekeeper/
Apache License 2.0
3.7k stars 758 forks source link

Dynamic Mutations based off Rego #1134

Closed ldbecker-zz closed 1 year ago

ldbecker-zz commented 3 years ago

Describe the solution you'd like Looking at the current state of gatekeeper mutations, mutations objects (Assign, AssignMeta) do not appear to be using Rego at all, unlike validation Constraints and ConstraintTemplates. Using kube-mgmt for mutations allows for more dynamic mutations defined using Rego. Are there plans to support Rego defined mutations? An example use case would be a mutation that assigns a label based off of some field in the object being mutatated.

Anything else you would like to add: The current syntax for mutations (Assign, AssignMeta) seem to be completely static. Am I correct in this assumption?

Environment:

maxsmythe commented 3 years ago

Mutation is in its early days (pre-alpha), so our biggest concern is to get something functional out there so we can understand people's use cases.

You mention the ability to assign a label based off of a field in the object being mutated. Do you have an example scenario where this would be useful?

The reason I ask is that per the analysis in the Mutation Dynamics doc, it is possible to have non-converging mutations if one field is able to be modified with respect to another field. See the "The ability to reference mutable data" section for an example. There may be safe ways of doing this, but figuring that out is non-trivial.

The Mutation Dynamics doc also gives background on why convergence and "plays well with eventual consistency" behaviors are important.

You may also find the "De-Risking the Future" section interesting, as it talks about ways we could take the syntax and move it to something more flexible (e.g. rego-backed mutation templates), if necessary.

The Mutation Design doc shows our current approach.

ldbecker-zz commented 3 years ago

Thanks for the insight @maxsmythe. There are lots of scenarios where desired mutations depend on the object being mutated. Off the top of my head I can think of:

1) Consider a case where namespace names are of the format "{app name}-{dev|test|prod}". Having namespaces be mutated to assign an immutable label called "environment" based off the suffix of the namespace name and an immutable label called "appName" based off the prefix. Based off the environment of an ingress object being mutated, one might want to assign certain annotations to the ingress. For example, assigning the kubernetes.io/ingress.class annotation based off environment. We may want to use nginx for "dev" and contour for "test" or "prod" namespaces.

2) Consider a scenario where a pod's resource requests/limits are not allowed to be defined explicitly, but rather are defined by the environment label mentioned above. All "dev" pods would be assigned certain resource requrests/limits, "test" pods get more, "prod" pods even more.

3) Consider a scenario where pod QOS is not allowed to be defined directly, but is rather assigned by mutations. QOS of "best-effort" is mutated into "dev" and "test", whereas "prod" pods get assigned QOS of "guaranteed".

Those are the first use cases that popped in my head. I will take a look at the design docs you posted.

All of the scenarios above I mentioned can be implemented using OPA with kube-mgmt. It would be great to use Gatekeeper for these type of mutations, but without rego support it does not seem possible. Is adding support for Rego in gatekeeper mutations on the roadmap going forward?

maxsmythe commented 3 years ago

If all the decisions are namespace-based, could this not be done using namespace-focused match criteria? It would require denormalization of the logic into discrete mutations, which could get cumbersome over time, but should be possible.

All of these use cases seem to focus on a kind of environment-based switch with some form of immutable metadata (in this case namespace name) providing the environment information. There may be something to that, as reading immutable data should be a "safe" operation as defined by those documents I linked. There may be something to that, though how that immutable data should be provided is an open question. Maybe using label values and/or namespace name is a way to address this without denormalization.

I think the design docs are important for continuing this discussion, as they give a lot of the "why" and "how" we're taking this as a first approach. In my earlier post, I mention the "De-Riskin the Future" section of the Mutation Dynamics doc. That would be a good reference for how we are approaching the question of integrating Rego with the current feature set, should it become necessary.

moandersson commented 2 years ago

A feature I would like in the Gatekeeper mutation is the ability to create new objects based on a match criteria, not only changing the incoming object. An example of this would be if a new namespace is created, a mutating webhook can create a resourceQuota and a roleBinding together with the namespace. Not sure if this is part of the scope of the mutating webhook, but it would be a really nice feature instead of writing a seperate operator for this.

dogzzdogzz commented 2 years ago

Our use case is that developers deploy their apps with HPA resource through CICD pipeline, devops team would then use scheduled-scaler to dynamically adjust HPA min/max settings according to the time window, for example, decrease min/max from 6/60 to 2/20 at midnight and change it back in the morning. So once the HPA resource created, developers can not update the hpa min/max settings via deployment pipeline anymore to avoid overwriting the hpa setting that scheduled-scaler patched. Currently we use our own mutating webhook to retrieve current min/max setting and replace the one in incoming object with it.

We are looking for if gatekeeper's mutator can retrieve not only metadata but also other field of specific existing object like spec when it is triggered. Audit cache might help but it causes very heavy loads on both audit pod and API server because it only can replicate ALL objects of specific kind of resource (e.g. pods) and we have more than 500 apps running in cluster, another downside is that there is a small chance that the data in cache has been outdated because the scheduled-scaler just updated new min/max settings and developer trigger the app's deployment before next replicate sync time, the mutator would refer to the outdated settings in cache and replace with it instead of the latest one scheduled-scaler just patched.

maxsmythe commented 2 years ago

@dogzzdogzz

Validation seems like a better use-case for this, since you want to prevent users from doing things. Mutation webhooks can be clobbered by later-applied mutations, from the K8s docs:

https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks

Note: Admission webhooks that need to guarantee they see the final state of the object in order to enforce policy should use a validating admission webhook, since objects can be modified after being seen by mutating webhooks.

WRT caching, is the problem that there are a lot of HPA resources? An alternative would be to hardcode the allowed times/values in the constraint itself (though this is duplicating config). Rego allows you to write policies against the current time:

https://www.openpolicyagent.org/docs/latest/policy-reference/#time

This is assuming you have uniform times/caps in your HPA resources.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

ctrought commented 2 years ago

+1 for this. I also have some additional thoughts on the Mutation and Constraint apis on the Contraints. The mutation apis are standard and configured entirely in basic yaml, where as Constraints are obviously using rego. It's great, but it's not consistent experience, I lack the controls I need to Mutate objects and building every constraint in rego seems overkill. It would be ideal to have some middle ground for both imo, as opposed to having too much control (constraints) and not enough control (mutations).

In summary Iā€™d love to see mutation and constraints treated equally and offer a standard API to create basic policies in yaml and more advanced policies in rego to cater to both types of users to drive adoptionā€¦ ie. Mutate (yaml), Constraint (yaml), CustomMutate (rego), CustomConstraint (rego).

maxsmythe commented 2 years ago

WRT constraints using Rego, we have our eye on this KEP:

https://github.com/kubernetes/enhancements/pull/3492

Once work starts on it, I imagine we'll want to make sure we can integrate with it appropriately, which may add more options for writing templates.

What sort of mutation logic are you looking to write that isn't covered by the existing objects? Having convergence as a property allows for scaling authorship of mutation across large orgs safely, so we probably don't want to sacrifice that (using Rego for mutators would definitely make it impossible to guarantee convergence), but there are probably plenty of use cases that can be met without doing so.

jaredcurtis commented 2 years ago

I would like additional functionality in a Mutator around injecting a sidecar onto a deployment. This works currently but seems limited. The standard practice I've seen in most sidecar injectors is to use Annotations to trigger the mutation and provide parameters for the resulting sidecar. As an example:

apiVersion: v1
kind: Pod
metadata:
  name: annotations-demo
  annotations:
    logging.example.com/volume: "logs"
    logging.example.com/rotate: "1h"
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
    volumeMounts:
     - name: logs
       mountPath: /logs
  volumes:
   - name: logs
     emptyDir: {}

The resulting pod would then have a sidecar similar to

- name: logging
   image: logging-agent:latest
   env:
    - name: LOGGING_ROTATION
      value: "1h"
    volumeMounts:
    - name: logs
      mountPath: /var/logs

For a complete example of what I would like, you can reference Hashicorp Vault or Consul

maxsmythe commented 2 years ago

Interesting. The idea of referencing metadata to define variables could work, so long as AssignMetadata keeps the property that it can only create-if-missing.

maxsmythe commented 2 years ago

And the idea of having variables in the YAML that get substituted is also workable, though there would probably be a performance impact.

tahirraza commented 2 years ago

Another scenario for a more dynamic mutation is replacing image registry per cluster env. For instance, you define kube manifests once, letā€™s say it uses image grafana/grafana .

Based on the kube cluster it deploys to, mutation webhook can change/prepend the image registry to use private one.

Alternatively user would deploy custom mutation webhook or worst case, clobber the deployment manifests for each env.

This is highly desirable and kyverno policy engine has lot of maturity on it.

maxsmythe commented 2 years ago

I think we definitely want to have a mutator that can alter images in set ways (e.g. change tag/registry/etc.)

ritazh commented 2 years ago

Lets discuss this more in this Wed's community call.

ctrought commented 1 year ago

Another use case we have. ArgoCD rbac policies are concatenated in a single string like below. I'd like to mutate these to add another group if it does not already exist.

Input

apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: argocd
  namespace: argocd-tenant1
spec:
  rbac:
    defaultPolicy: 'role:admin'
    policy: |
      g, system:tenant-group1, role:admin
      p, role:admin, repositories, create, *, allow
      p, role:admin, repositories, update, *, allow
      p, role:admin, repositories, delete, *, allow

Output

apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: argocd
  namespace: argocd-tenant1
spec:
  rbac:
    defaultPolicy: 'role:admin'
    policy: |
      g, system:tenant-group1, role:admin
      g, system:cluster-admins, role:admin
      p, role:admin, repositories, create, *, allow
      p, role:admin, repositories, update, *, allow
      p, role:admin, repositories, delete, *, allow
maxsmythe commented 1 year ago

Interesting. The set-like behavior of "add line X if line is missing" is definitely idempotent, though guaranteeing idempotence is definitely a sometimes unexpectedly hard programming problem.

We can definitely build idempotent mutators for common cases (see the ModifyImage mutator being written), but I'm not sure we can cover all cases.

On the other hand, allowing arbitrary mutation logic opens up the possibility of coding errors impacting the control plane, so I could see certain org structures not wanting to take the risk of delegating that.

ExternalData is an example where we allow users to self-attest that their code is idempotent. Maybe we can do something similar but for in-process evaluation, in a way that clusters can lock down. We'd also need to know what invariants we need to rely on for convergence (e.g. no modifying metadata that is also used for selection). This has knock-on impacts for other feature requests. E.g.:

Either it is possible to write code to arbitrarily mutate metadata.annotations, or we can use annotations for selecting (a-la #2087 ), but we wouldn't be able to do both.

Curious to hear people's thoughts.

maxsmythe commented 1 year ago

One thing, we still might need to lock down certain logic, like testing fields outside of what's being mutated, since those can combine with other mutators to be non-idempotent. IIRC even testing the value-to-be-mutated can have negative side-effects. It's not clear to me how we'd be able to subset some of these behaviors for privileged users without potentially breaking guarantees for everyone.

TL;DR I'm not sure we can actually do what I'm suggesting in a way that maintains the higher-order guarantees we're looking to provide.

tahirraza commented 1 year ago

@maxsmythe - I like the idea of building mutators on need bases. I suppose over a period of time if there is a pattern, we can always refactor and create more common mutators.

ctml91 commented 1 year ago

I'd like to add a label of backup=true to a persistent volume claim if these conditions are met

I don't believe this is currently possible in Gatekeeper, so right now I'm considering using a custom script or Kyverno. I would prefer to use Gatekeeper though.

nidamanuri091993 commented 1 year ago

I have a scenario where i want to add annotations for kubernetes deployment per each application. Earlier in OPA i use a json output for each app and update annotations using patch. Would i be able to do this dynamic annotations in GK mutation.

maxsmythe commented 1 year ago

Can you give an example of the type of annotation? Is it deployment-specific? Are the values relatively static? What criteria do you use to determine which resources get what annotation?

nidamanuri091993 commented 1 year ago

These annotations are application specific, currently with OPA we are making a GET call based on a lable in the namespace and use the JSON response to add the annotations using patch. We wanted to do this with gatekeeper.

ZhiminXiang commented 1 year ago

I have a use case that I want to read the value of a label from a namespace, and use the value to set a spec field of an object. The value of the label in the namespace is static, meaning that it is set when creating the namespace, and won't be change later. We would like to leverage gatekeeper mutation to handle this.

jwineinger commented 1 year ago

I haven't read all of the above, so apologies if this is a duplicate. I have a need to effectively correct the structure of a resource, and so I'd like to be able to move a key from one place on the input to another. Perhaps more generically, I'd like to be able to set a key path in a spec to a value referenced from the input object. As I understand it, none of the mutation CRDs for gatekeeper support this.

In this case, I don't really care about the values in each resource nor do I want to specify the values exactly. I just want to restructure the spec using whatever values the input resource has.

seh commented 1 year ago

I had the sameā€”or at least a similarā€”need recently: A vendor changed the schema used in their CRD, effectively renaming two fields (in Kong's IP Restriction plugin, per Kong/kong#8560, "config.blacklist" became "config.deny," and "config.whitelist" became "config.allow.") Not all of our manifest authors could change them easily enough, so wanted to rewrite them at admission time, with the following rules:

We could not use the Assign mutation for this, because it's neither allowed to read from sibling fields outside of the metadata nor delete fields.

Someone could claim that this desire threatens oscillation, or at least nondeterminism, since we could have a similar policy that swaps these fields back again. We would not do that, though. I'd prefer to take my chances and avoid needing to write my own Webhook to evade this abstract threat.

maxsmythe commented 1 year ago

Someone could claim that this desire threatens oscillation, or at least nondeterminism, since we could have a similar policy that swaps these fields back again. We would not do that, though. I'd prefer to take my chances and avoid needing to write my own Webhook to evade this abstract threat.

The assumption there is that all mutator authors are trusted and that they are coordinated enough to avoid this situation.

For platform providers offering a solution to tenants, the tenants may not be fully trusted.

For large organizations, where different departments own different sets of mutations, the authors may not be coordinated.

I'd like to find a way to balance the use cases of "I need a syntax that guarantees order in my system because of how delegation works in my clusters" vs. "I own everything on the cluster, I just need a tool that makes it simpler to run code I write without rolling my own webhook"

ctml91 commented 1 year ago

The assumption there is that all mutator authors are trusted and that they are coordinated enough to avoid this situation.

I think if one allows untrusted users to use any proposed solution that could conflict with other mutators that's just a risk they are willing accept. If one must guarantee order, they can choose to not implement these types of unpredictable mutators and they are no worse off than they are today. They could even be optionally installed, rather than bundled in with the current GK install.

If at some point more structured mutators come along for well defined use cases, those users that must guarantee order across untrusted authors can then take advantage of them.

seh commented 1 year ago

Perhaps the Webhook program could accept a posture-conveying command-line flag that could switch it out of its current wary, distrustful mode and activate other modes, such as "unsafe" or "supervised," in which more of this kind of juggling between intra-object fields and even inter-object fields would be tolerated.

For some of that, we already have a way to express the desired changes; Gatekeeper just precludes their use. For other desires, though, we'd need new fields in the schema. Perhaps this is where opening it up to Rego is the escape hatch we need.

notcool11 commented 1 year ago

Our use case is for setting up OpenTelemetry environment variables. Specifically there is the OTEL_RESOURCE_ATTRIBUTES variable that allows for setting what resources to add to telemetry data.

Ideally we would have the full scope of all resources available, that way we could pick and choose which should be added as attributes for our metrics and traces.

Container Resources Kubernetes Resources

This means we would need access to the full spec to set up the one environment variable to rule them all.

notcool11 commented 1 year ago

Having played with this a bit now, as a user the most obvious solution would be to just allow me to have multiple assigns in one file. As I only get one location currently, the parameters field is already tied to it. If that was done explicitly with the schema to push parameters under location, then could I have multiple locations?

My use case is specific to setting environment variables where the location looks like this: location: spec.containers[name:*].env[name:GK_POD_UID]

If I could have multiple locations and have them execute top down, it would allow me to set up the variables in the right order. Which is needed for kubernetes dependent environment variables. Thanks!

sepich commented 1 year ago

I believe there are a lot of examples could be found in a "wont-fix" closed issues: https://github.com/open-policy-agent/gatekeeper/issues/1176

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

cunningr commented 11 months ago

Reading through it seems that this project may have the idea of providing this functionality? My use is simple ...

I want to check for the presence of a pod annotation. If it is missing I want to add multiple annotations, each with the name of a container in the pod. The desire is to basically disable forwarding of logs for all containers in the pod when the annotation is missing.