Open skaven81 opened 1 year ago
I think it's worthwhile noting that Kyverno is actively working on this, and has a default-and-overrides mechanism already in alpha: https://kyverno.io/docs/writing-policies/exceptions/. But it's rather bolted-on and I think my proposal would be a lot more elegant in Gatekeeper.
This is an interesting idea. I'm wondering how generalizable it is.
It seems like it depends on a set of exceptions fitting in to a hierarchy of permissiveness (e.g. sets of parameters that are increasingly more strict such that the first match will provide an exception that "short circuits" stricter scrutiny).
This definitely makes sense for numeric-type constraints (CPU, RAM limits and similar).
It could also make sense for something like pod security standards, where privileged > baseline > restricted.
How many other styles of policy fit into that mold? One example of a probably poor fit would be allowed container registries. It's possible to imagine a system of registries that are tiered by sensitivity, but I don't know if that's a common use case.
The simplicity also depends on the sets of exemption labels being disjoint. For example, let's say a user wants to use "baseline" but needs to have "host ports" disabled in their "networking" namespace. They also have to use "privileged" for their "system" namespace. Once the labels stop being disjoint, the hierarchical nature of the exceptions breaks down slightly (it's still expressible, but the complexity starts to creep back in).
I think knowing the range of use cases and how these exceptions grow in practice would be important for grounding/evaluating a design.
If the overall usefulness is limited to a few constraints (specifically the numeric ones), one possibility may be to do the override in Rego, taking the override directly from the label. Here is a (partially implemented, pseudocode) example of what that might look like:
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8smaxram
spec:
crd:
spec:
names:
kind: K8sMaxRAM
validation:
openAPIV3Schema:
type: object
properties:
defaultMaxRAM:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8smaxram
get_max_ram_from_namespace(object) = max {
ns := data.inventory.cluster.v1.Namespace[object.metadata.namespace]
max := ns.metadata.labels["max-ram-exception"]
}
get_max_ram(object) = max {
not get_max_ram_from_namespace(object)
max := input.parameters.defaultMaxRAM
}
get_max_ram(object) = max {
max := get_max_ram_from_namespace(object)
}
violation[{"msg": msg}] {
max_ram := get_max_ram(input.review.object)
ram_used := input.review.object.spec.ram
ram_used > max_ram
msg := sprintf("ram used must be less than %v", [max_ram])
}
The advantage of using the label value directly would be that admins can easily set whatever maximum they want on a per-namespace basis without needing to modify any constraints or label selectors. This should work for all constraints if you want a per-namespace override and is possible today.
It seems like it depends on a set of exceptions fitting in to a hierarchy of permissiveness (e.g. sets of parameters that are increasingly more strict such that the first match will provide an exception that "short circuits" stricter scrutiny).
No, I wouldn't say that. Creating such a heirarchy is of course possible with the construct I've proposed, but the policy exceptions I'm descrbing have no need to be "tiered" or "ordered" in any way.
The key feature I'm looking for is a default. That just can't be done today with Gatekeeper without a lot of customization in the ConstraintTemplates themselves. What I would like to see is the ability to set a default policy enforcement behavior, and then punch out specific exceptions to the rule, without having to manually manage a collection of mutually-exclusive-collectively-exhaustive matching selectors.
In programming languages, this is just a case
statement. That's all -- a case
statement doesn't care if the decisions it is making are "tiered" or "ordered" or have a "priority", beyond the fact that it's "first match wins".
How many other styles of policy fit into that mold? One example of a probably poor fit would be allowed container registries.
Why would allowed container registries be a poor fit? Seems like a perfectly reasonable fit to me, if you think of it in terms of a multi-tenant cluster where you want a default behavior for most tenants with overrides for a few. Using my own proposed syntax, the "allowed container registries" policy would look something like this:
spec:
match:
kinds:
- apiGroups: [ "" ]
kinds: [ Pod ]
parameterSelection:
- match:
namespaceSelector:
field.cattle.io/projectId: special-team1
parameters:
allowed-registries:
- gcr.io
- hub.docker.io
- match:
namespaceSelector:
field.cattle.io/projectId: special-team2
parameters:
allowed-registries:
- hub.docker.io
parameters:
allowed-registries:
- internal.artifactory.example.com
Observe that in this case the default case is just the spec.parameters
. My proposal suggests that upon a parameterSelection
match, the spec.parameterSelection[<match>].parameters
object is merged over the spec.parameters
object. So this results in the desired behavior of:
Special Team 1
parameters:
allowed-registries:
- internal.artifactory.example.com
- gcr.io
- hub.docker.io
Special Team 2
parameters:
allowed-registries:
- internal.artifactory.example.com
- hub.docker.io
Everybody else
parameters:
allowed-registries:
- internal.artifactory.example.com
That "everybody else" part is what is unique and desirable about this way of configuring a Constraint. In the absence of a default+overrides mechanism, I would have to create three separate Constraints, and very carefully ensure that the match
field of all three Constraints are mutually exclusive and collectively exhaustive, to get the desired behavior.
Once the labels stop being disjoint, the hierarchical nature of the exceptions breaks down slightly (it's still expressible, but the complexity starts to creep back in).
That's only if you think of this strictly in terms of a heirarchy. It's not a heirarchy. It's just a default behavior, with the ability to express overrides. Who cares if the labels and selectors are disjoint, because it's still "first one wins" (like a case
statement) or (as I suggest) optionally "all that match win" which still gives desirable behavior for certain cases. I get that a given team might need exceptions across multiple Constraints to achieve a certain customized security profile that differs from the default. But that's precisely why I think this syntax/model will work. And I know it will work, because I'm already implementing it at my company, just in Rego instead of a baked-in feature of Constraints. I've written all my Constraint Templates with additional Rego that allows for a default-and-overrides syntax in spec.parameters
. The proposed syntax in this RFE is directly inspired by my experience doing this exact same thing in production already.
If the overall usefulness is limited to a few constraints (specifically the numeric ones),
It would not be limited to just "orderable" or "sortable" things like numeric values. It will work for any set of parameters to a Constraint.
The challenge with your proposal of using labels on the namespace itself to create these sorts of exceptions, is that it's limited to just a namespace. That only works for namespaced multi-tenancy. With Project-based multi-tenancy, where groups of namespaces are assigned automatically to tenants upon creation (this is how Rancher does things and it's glorious), we would need some kind of controller that synchronizes these exception labels across all the namespaces in a given Project. But Rancher already does that for us -- it provides field.cattle.io/projectId
labels on every namespace (and has validating admission webhooks that prevent unprivileged tenants from modifying that label). So I would like to be able to declare exceptions to the default behavior in my Constraints, by leveraging that label. Not by creating a whole new ecosystem of lables on namespaces that have to both be protected from abuse and also synchronized among the namespaces in a Project.
I think knowing the range of use cases and how these exceptions grow in practice would be important for grounding/evaluating a design.
I can provide dozens of use cases if desired. As I mentioned earlier, we are already doing this in production across nearly 100 multi-tenant Kubernetes clusters with nearly 1000 different development teams. And it works really well...except for the fact that I have to add quite a bit of custom Rego to every Constraint Template I write, to patch in the default-and-overrides behavior I need.
I updated the top level description based on the discussion in #3081.
I should also clarify that my objective here is not to create a single, monolithic Constraint that handles everything in the cluster. I would still have dozens of Constraints. It's just that each Constraint would focus on a single policy/security "topic" with the parameters selected at runtime based on a ruleset (e.g. a case
statement) that can match some aspect of the object under evaluation and its containing namespace.
I should also clarify that I'm not looking to uproot the current model -- I was quite careful in my proposal to ensure that existing Constraints with a typical match
+ parameters
construction, would continue to work just fine for any Gatekeeper users who don't want to use the parameterSelection
model. It would just be an additional bit of flexibility granted to Gatekeeper users that would rather avoid having to maintain dozens of Constraints in a "group" that all do the same thing, just with MECE match
fields and slightly different parameters
.
For cluster administrators that have no need for a Constraint that has a "default" application across the cluster, or for those that don't do multi-tenancy, or for those that don't mind using the existing methods provided by match
to create a MECE group, more power to them. I'm asking for this feature because it would make my life a lot easier and simplify the application, management, and testing of Gatekeeper policy in all my clusters.
I'll reinforce again that deciding whether the basic concept of "default+overrides" is applicable to Gatekeeper, has already been answered by the Kyverno team, as they're already doing it. They've created CRDs for it and everything. I don't personally like their solution (and I find Kyverno rather obtuse) but clearly there is demand out there for the ability to establish some kind of default behavior and then punching out overrides to that default as needed.
No, I wouldn't say that. Creating such a heirarchy is of course possible with the construct I've proposed, but the policy exceptions I'm descrbing have no need to be "tiered" or "ordered" in any way.
That's fair, I got distracted by the cascading nature of the RAM example.
Why would allowed container registries be a poor fit? Seems like a perfectly reasonable fit to me, if you think of it in terms of a multi-tenant cluster where you want a default behavior for most tenants with overrides for a few.
Agreed. I assumed it was a poor fit due to the "ordered" lens I put on the initial idea.
That only works for namespaced multi-tenancy. With Project-based multi-tenancy, where groups of namespaces are assigned automatically to tenants upon creation (this is how Rancher does things and it's glorious), we would need some kind of controller that synchronizes these exception labels across all the namespaces in a given Project.
+1
An alternative API design with the same effect could be to create a "MultiConstraint" object (better name TBD). Something like below:
kind: MultiConstraint
metadata:
name: memory-limit-enforcement
spec:
constraintKind: K8sMemoryLimits
match:
kinds:
- apiGroups: [ "" ]
kinds: [ Pod ]
constraints:
- name: max-64g
spec:
match:
namespaceSelector:
matchLabels:
field.cattle.io/projectId: 64g-team
parameters:
limit: 64Gi
- name: max-32g
spec:
match:
namespaceSelector:
matchLabels:
field.cattle.io/projectId: 32g-team
parameters:
limit: 32Gi
- name: max-16g
spec
parameters:
limit: 64Gi
Or perhaps, to preserve schema validation, create a per-template GVK, something like K8sMemoryLimitsMultiConstraint
(though max string length for the kind could become a concern, also CRD proliferation).
Would need to think about the correct balance between expressiveness/DRY-ness, but doing it this way could allow for things like overriding enforcement actions and would decouple the proposed default/exception model from how constraints may evolve in the future (example, multi-enforcement-point-actions)
I'm guessing there are plenty of other roughly-logically-equivalent ways to express this MECE-achieving intent worth exploring.
There are a couple of ancillary concerns here:
K8s Validating Admission Policy is intended to have the API server itself enforce admission policies. One plan Gatekeeper has is to project templates (at least those which can be projected) onto VAP primitives.
I'm not sure VAP currently has enough expressiveness to make projection of this viable. I don't think this is a deal breaker -- there will be plenty of constraint templates that will not be compatible with VAP (for instance, anything that uses referential data). However, it's worth considering if there's anything to be done about that, or how to make the UX/asymmetry grok-able to users.
Right now we pre-cache constraints in Rego's data store. IIRC this is to avoid the serialization/deserialization penalty that injecting constraints at eval time would create. If we need to dynamically choose which constraint to use, that approach becomes less viable.
@davis-haba the work on iterable cached data may be useful here. I could imagine sideloading a "constraints" data storage object per-request, which could be dynamic AND bypass serialization. This would have the side benefit of allowing us to remove all of these methods from the Driver
interface:
I agree that there is room for debate and discussion about the technical implementation, and also that whatever model is accepted should fit cleanly into the rest of the Gatekeeper code stack and not create unnecessary diversion from upstream K8s architectural direction.
Really the key thing that would satisfy my particular use case is a concise and flexible way to express MECE collections of Constraints. It's technically possible today but it's not very concise and requires repetition across multiple Constraints to achieve MECE.
Perhaps we can come up with a small but simple approach that would get us 80% of the way there with 20% of the effort.
The MultiConstraint
resource I think is a reasonable approach, provided the way it would work is more or less like what I propoosed, where Gatekeeper evaluates each of the embedded Constraints in order and stops once it finds a match (or falls through to a default condition). A simple tweak would also allow for MultiConstraint
to function as a collection of traditional Constraints, just by having a flag that instructs Gatekeeper whether or not to continue evaluating the list once it finds a match. There could be some benefits to collecting a bunch of similar constraints (even if they are not intended to use the default-and-overrides MECE pattern) into a single resource, simply for the sake of clarity.
If you're looking for suggestions, I like the names OrderedConstraint
for a resource type that cannot be coerced into a simple group. For a resource that has a configurable behavior for evaluating the list of Constraints, GroupedConstraint
or ConstraintGroup
might be good. It's also possible to have both, with OrderedConstraint
and GroupedConstraint
being the exact same resource spec (just a different kind
) but handled differently by the controller -- OrderedConstraint
stops once it hits the first match, while GroupedConstraint
evaluates all of them and processes any matches.
Perhaps we can come up with a small but simple approach that would get us 80% of the way there with 20% of the effort.
Curious if you have any thoughts on this direction, especially if it is an incremental move to a more fully-realized solution
The MultiConstraint resource I think is a reasonable approach, provided the way it would work is more or less like what I propoosed, where Gatekeeper evaluates each of the embedded Constraints in order and stops once it finds a match (or falls through to a default condition).
Yep! That was my thought.
There could be some benefits to collecting a bunch of similar constraints (even if they are not intended to use the default-and-overrides MECE pattern) into a single resource, simply for the sake of clarity.
Curious if there are benefits other than clarity here. E.g. useful evaluation methods beyond "first matched only".
Another question would be if all the constraints in a collection resource must be the same kind or not: mixed kinds may be more useful, but the accounting may be harder as templates are added/removed. Of course, mixed kinds could always be added later if room were left in the API.
One design feature of constraints I don't want to lose is composability (AFACT nothing here threatens this). Basically that the consequences of adding/removing a constraint are well-defined and that constraints are non-interacting amongst each other. This allows G8r to play nicely with K8s eventual consistency and means users can mix constraints from various sources without worrying about a regression in enforcement.
If you're looking for suggestions, I like the names OrderedConstraint for a resource type that cannot be coerced into a simple group. For a resource that has a configurable behavior for evaluating the list of Constraints, GroupedConstraint or ConstraintGroup might be good. It's also possible to have both, with OrderedConstraint and GroupedConstraint being the exact same resource spec (just a different kind) but handled differently by the controller -- OrderedConstraint stops once it hits the first match, while GroupedConstraint evaluates all of them and processes any matches.
Definitely like the names for the ideas proposed. I think knowing what a good path forward would be to know the shape of what we want (e.g. is there an 80% story in here somewhere), then we'll be in a good place to name things.
Curious if there are benefits other than clarity here. E.g. useful evaluation methods beyond "first matched only".
Really just for clarity. With traditional Constraints, grouping a bunch together requires something like a naming convention, to make it obvious which Constraints are in or out of the group. The only "built in" grouping mechanism is for the kind
of the Constraints (which, to be fair, is a pretty good grouping mechanism). But if I kubectl get constraint
and get 50 results back, but really there's only five actual "policy constraints" -- just each of them have ten different match
selectors and parameters
-- you can probably imagine that it would feel a lot tidier to be able to group those together so that a single intent maps to a single resource. I suppose what I'm proposing here includes some degree of expanding intent from a single (kind, match, parameters)
tuple to a more expressive (kind, (match, parameters), [(match, parameters),...])
tuple.
Another question would be if all the constraints in a collection resource must be the same kind or not
I could see the value in allowing mixed kinds
for a generic GroupedConstraint
(one that doesn't break out of evaluation upon finding a match). But as noted above, other than visual/conceptual clarity, such a construct probably doesn't hold much additional value to begin with, so I think we can out-scope it.
Which leaves the MultiConstraint
/OrderedConstraint
model, which my gut tells me should stick to a single kind
. I'm imagining the kind of trouble one could get themselves into trying to predict the behavior of a MultiConstraint
that could actually end up enforcing different policies (e.g. activating different ConstraintTemplates
as it works through the evaluation looking for a match. Feels very "spooky action at a distance". Opaque and tough to troubleshoot. So I think we should stick with a single kind
for this type of Constraint grouping/evaluation model.
I think knowing what a good path forward would be to know the shape of what we want (e.g. is there an 80% story in here somewhere), then we'll be in a good place to name things.
Based on the discussion so far (which has been great!) I feel like we've landed on a few key architectural decisions:
kind
. This will ensure predictable behavior: an ordered Constraint still enforces a single "policy" just like traditional Constraints do, it's just we're adding a more flexible way of determining whether a given resource violates that policy. I have to imagine this also has performance benefits, as Gatekeeper only needs to cache a single ConstraintTemplate and Rego program to evaluate an ordered Constraint.(match, parameters)
tuples. Each match
is evaluated in turn. The first match
that matches the resource under evaluation, uses its parameters
to execute the ConstraintTemplate's Rego. The resulting violation (or admission) is then passed out.Where it sounds like we still have some discussion is:
(match, parameters)
tuples, what does a missing or empty match
mean? I believe it should mean "match anything (that has already matched the parent match
)". Is there any value in allowing for a negative case (match nothing)? Since we're limiting ourselves to a linear "first one wins" approach, there is no practical value in a "match nothing" as it would just be skipped over every time. Even if it's made that a "match nothing" triggers a halt to evaluation, it's not like there's a branch that could take evaluation to the items below the "match nothing". I'm rambling a bit but suffice to say I think it's reasonable to have an empty match
mean "match everything" and not provide for a symmetrical "match nothing".OrderedConstraint
? If we do, then this would deviate from the current model of how Constraints work, in the sense that there is no kind: Constraint
at all. Constraints are instances of a resource kind that are a reflection of a ConstraintTemplate
. And since we've noted above that an "ordered Constraint" will only be of a single Kind, I think it makes logical sense to maintain the current model, where implementing a policy enforcement is done by creating an instance of the kind
created by the ConstraintTemplate. spec.parameterSelection
(or a better name if we can think of one). This would be an array of objects, each object having an optional match
(which Gatekeeper already has the spec for), and a required parameters
(which is just copied and pasted from the spec.parameters
spec).spec.match
and spec.parameters
are unchanged, and a Constraint with a missing spec.parameterSelection
just behaves like a Constraint does today. But the new behavior is now available just by adding an attribute to implement the override.enforcementAction
? Do we allow overrides to change the Constraint's enforcementAction
? My gut again says no -- as it would cause the Constraint's behavior to no longer be self-documenting and transparent. When one lists Constraints, it's nice to have the enforcementAction
listed right there. If overrides can change that, then trying to express to an administrator what the current state of enforcement is for a given policy would get a lot more complicated. I can think of cases where it would be valuable to change enforcementAction
in an override, but I think the loss in clarity and added complexity for troubleshooting, makes it not worth it.I feel like this is a way more elegant approach than Kyverno's, which implemented a special CRD just for expressing overrides. With what we're hopefully converging on here, one can start with normal Constraint that behaves in some global fashion. Then when that first exception case comes along, instead of having to create an entirely new resource (which one then has to remember interacts with the original "default" resource), the administrator just has to add a handful of lines of YAML into the original Constraint to clearly express the exceptional condition. As more exceptions appear it's trivial to continue adding to the list. It's self-documenting (which after all is one of the hallmarks of Gatekeeper's approach over competing solutions) and transparent.
This is a very good feature, I hope it will be implemented in the short time.
My 2 cents: We have similar use case, where I have a cluster wide baseline policy for e.g. hostPorts.
The problem:
Possible solution?
Example:
Template:
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sallowedhostports
...
spec:
crd:
spec:
names:
kind: K8sPSPHostNetworkingPorts
...
Baseline constraint:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPHostNetworkingPorts
metadata:
name: psp-host-network-ports
annotations:
metadata.gatekeeper.sh/weight: 100 # example new feature
spec:
enforcementAction: warn
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
hostNetwork: true
min: 8000
max: 9000
Specific constraint:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPHostNetworkingPorts
metadata:
name: psp-host-network-ports-prometheus-node-exporter
annotations:
metadata.gatekeeper.sh/weight: 90 # example new feature
spec:
...
namespaces:
- monitoring
name: kube-prometheus-stack-prometheus-node-exporter
parameters:
hostNetwork: true
min: 9100
max: 9100
Describe the solution you'd like I need the ability to define a default behavior for the cluster, and then selectively implement overrides to that default behavior. With the toolkit available in Constraints today (notably, the
match
field), combined with the fact that ALL matching Constraints are evaluated for a resource to make a policy decision, this leads to operationally infeasible complexity when trying to implement something that behaves with a default behavior.Let's work through an example. I have a large multi-tenant cluster, using Rancher's Project model. This means I have hundreds of teams, each with the ability to create namespaces in their Project. Rancher Projects are just special labels on namespaces. All the namespaces with the same
field.cattle.io/projectId
label are in the same Project and subject to the same set of security and behavioral policies. Rancher has its own validating webhook that prevents unprivileged tenants from modifying that label, and only allows them to create namespaces in their own Project.Now I, the Kubernetes administrator, want to impose some security or behavioral policy on the cluster. For this example I'll use memory limits, but the principle applies to virtually any Constraint I'd want to create in the cluster.
I start by creating a Constraint that declares that all Pods must have memory limits defined, which may not exceed 16GB.
Immediately I have a problem...this enforcement has now applied to all of my administrative workloads as well, which is undesirable. So I need to exempt all those namespaces from the Constraint. What I need to do is exclude all namespaces with the "System" projectId, e.g. exclude namespaces with
field.cattle.io/projectId=system
or whatever. I can update the Constraint like so:Ok great...now my administrative workloads are exempted. But I still have a problem -- my 16GB limit applies to ALL other namespaces in the cluster. Inevitably, some team is going to come to me and ask for an exception. They need to be able to run Pods with a 32Gi limit. And after evaluation I agree that it's OK. Now I need two Constraints, that implement MECE to ensure that every namespace in the cluster only matches one (and only one) of these memory limit constraints.
This was a little awkward, because I had to make mention of the
32g-team
label in the "primary" (default) Constraint as well as the override Constraint. And I don't like repeating myself. But a few weeks later, another team needs a limit of 64Gi... now we have three Constraints...Now imagine this same system but with hundreds of tenants in the same cluster. And dozens of Constraints that each, as a group, have to be MECE-managed, and a single slip-up that breaks MECE in the
match
sections of a group of Constraints, will cause a production outage. Not good.What I would very much rather, is that each Constraint for a given security/policy enforcement, can be authored with a built-in ability to match the resource being evaluated against a list of matching rules, and the first one that "wins" gets that set of parameters (or overrides to a default set of parameters) applied. So in the example above with 16, 32, and 64 gig teams, instead of needing three separate Constraints, I can instead author a single constraint that guarantees MECE (because there's only one).
My proposal is to add a new field to Constraints called
parameterSelection
. This would be a list of objects. Each object may optionally contain amatch
field (ifmatch
is not specified then it matches all resources), and a requiredparameters
field.The Constraint's top-level
match
still behaves as it does today, providing initial guidance to Gatekeeper on whether the Constraint should be evaluated for the resource at all. Assuming the resource matches the top levelspec.match
field, then Gatekeeper then evaluates each of theparameterSelection
items in order. The first one that has amatch
object that matches the object under review, has itsparameters
object merged over thespec.parameters
object (or sets it, ifspec.parameters
is not present). Upon the first match, further evaluation ofparameterSelection
is halted. If the entire list ofparameterSelection
is exhausted with no match, then the Constraint is evaluated withspec.parameters
unmodified (or, another top level field can be added to Constraint to control the behavior -- I can imagine situations where aborting evaluation of the Constraint would be preferred if nothing inparameterSelection
matches).Additonal attributes can be added to the top level
spec
to further refine the behavior ofparameterSelection
. For example:spec.onParameterSelectionNoMatch: {proceed,allow,warn,deny}
-proceed
continues, usingspec.parameters
unmodified.allow
andwarn
skips evaluating the Constraint and assumes allow (withwarn
issuing a warning in the log and an Event).deny
skips evaluating the Constraint and assumes a denial.spec.parameterSelectionBehavior: {MatchFirst,MatchAny}
-MatchFirst
will stop when the first match is encountered.MatchAny
will continue and merge each matching item'sparameters
overspec.parameters
. This would allow for far more expressive default-and-override behaviors.Environment: