kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
560 stars 185 forks source link

allow karpenter to filter and create nodes with non-well-known kubernetes labels #1046

Open elmiko opened 7 months ago

elmiko commented 7 months ago

Description

What problem are you trying to solve?

As an openshift user, i require new nodes to take the label node-role.kubernetes.io/worker. Although this label uses the kubernetes.io prefix, it is not an official well-known label. Currently, karpenter does not allow NodePools to be created when a label with the kubernetes.io prefix is used on a non-well-known label for new node creation, or when checking predicates.

How important is this feature to you?

This is fairly important for openshift as the label is used frequently. It is possible to work around this, but for long term usage on openshift a permanent solution would be better.

This issue was discussed at the Feb. 26 2024 community meeting.

jonathan-innis commented 7 months ago

Looks like there are a couple other issues that were asking for similar functionality here: https://github.com/aws/karpenter-provider-aws/issues/1941 and https://github.com/kubernetes-sigs/karpenter/issues/660

jonathan-innis commented 7 months ago

What's the security implication of allowing users to set this custom label? From a little sleuthing, it doesn't seem to be any different than the node-restriction.kubernetes.io label that we already support today.

sftim commented 7 months ago

As Karpenter is a Kubernetes project, I'm wary of even indirectly encouraging use of unregistered labels.

If node-role.kubernetes.io/worker should become an official label, let's get that registered first - it's not a particularly difficult process. However, registration first, then adoption.

“OpenShift uses it” does not count as grounds to add support into Kubernetes code.


OpenShift folks, if you want a label to mark nodes as OpenShift workers, you can make one, eg openshift.example/worker: true

sftim commented 7 months ago

security implication

Nodes either can or can't set arbitrary labels on themselves, depending on what controls you've set. So long as we leave the kubelet to set that label, we're fine. If Karpenter starts setting a label on a node, it could provide a way to bypass restrictions on what labels nodes can have (although, letting someone write to your NodePools is already bad practice).

elmiko commented 6 months ago

As Karpenter is a Kubernetes project, I'm wary of even indirectly encouraging use of unregistered labels.

this makes me wonder if the kubelet should be creating warnings/errors if the user attempts to use an unregistered well-known label?

karpenter ~is~ may be preventing something that the kubelet allows.

If node-role.kubernetes.io/worker should become an official label, let's get that registered first - it's not a particularly difficult process. However, registration first, then adoption.

totally agreement from me, but this is historical as well. openshift has been using the label for a long time, i only mention this to help add context around the effort to change the value.

“OpenShift uses it” does not count as grounds to add support into Kubernetes code.

agreed, but let's take openshift out of the equation. as a user, i would like karpenter to create nodes with unregistered labels from the kubernetes.io prefix, i can do this today on the kubelet command line and i would like karpenter to do the same for me.

i think if this sounds like an unreasonable request at the karpenter level, then we should consider making the kubelet produce warnings when this happens because we are now asking karpenter to gate activity that the kubelet allows.

OpenShift folks, if you want a label to mark nodes as OpenShift workers, you can make one, eg openshift.example/worker: true

no objection from me, but i think this is orthogonal to the true issue here. i only mentioned openshift to help illuminate the use case.

So long as we leave the kubelet to set that label, we're fine. If Karpenter starts setting a label on a node, it could provide a way to bypass restrictions on what labels nodes can have (although, letting someone write to your NodePools is already bad practice).

i think this question is important, if karpenter is adding the label after node creation, then i totally agree with it preventing unregistered labels from being added. but if karpenter is passing the labels down to the kubelet configuration, then i think we should reconsider.

thanks for the thoughtful discussion @sftim and @jonathan-innis

elmiko commented 6 months ago

one additional thought on this, even if karpenter cannot create nodes with the unregistered labels, it would be nice to able to have it know when a node will be created with that label so that it can properly predict which nodes should be created.

essentially, i am fine with karpenter not placing the unregistred label on the node, but i should be able to tell karpenter "hey if you see these pods, put them on nodes will this label, and oh btw these node classes will have that label".

sftim commented 6 months ago

We have code in Kubernetes to prevent the node setting disallowed labels. You can use Karpenter to bypass that; it's your cluster.

However, should we encourage it? I say we should not. Upstream Kubernetes and in-project code should abide by our architecture standards, even if individual end users intend to disregard them.

TL;DR: offering an in-project security footgun feels like an obviously bad choice.

elmiko commented 6 months ago

However, should we encourage it? I say we should not.

agreed

TL;DR: offering an in-project security footgun feels like an obviously bad choice.

just to be clear, do you consider the footgun to be: karpenter adding disallowed labels after the node is created, karpenter adding disallowed labels to the command line flag list for kubelet, or both?

elmiko commented 6 months ago

reading this a little deeper, https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction

i have a question about this interaction with the admission controllers, if a user has disabled the admission controller for node restriction, will that change be reflected in karpenter's restriction as well?

it seems like we would want karpenter to reflect what the admission controllers are doing, so if an update to a node object gets rejected wouldn't we want to surface that error to the user?

sftim commented 6 months ago

I suppose it's an outright footgun if a compromised node can, in any way, set labels on itself it shouldn't be able to, or cause those to be set.

I'm also opposed to tacitly encouraging people to use Kubernetes-namespaced labels that Kubernetes doesn't recognize. We shouldn't staff work that encourages doing more of the wrong thing.

sftim commented 6 months ago

i have a question about this interaction with the admission controllers, if a user has disabled the admission controller for node restriction, will that change be reflected in karpenter's restriction as well?

No; in general, Karpenter cannot know what admission controllers are in use.

elmiko commented 6 months ago

No; in general, Karpenter cannot know what admission controllers are in use.

perhaps i phrased that poorly, i'll try to restate.

imagine a user has the noderestriction admission controller disabled. the user configures karpenter to make changes that would otherwise be prevented by the admission controller. karpenter refuses to allow the changes because it has duplicated the default rules for the noderestriction admission cotroller. should karpenter be the component responsible for rejecting the configuration?

i'm wondering if karpenter should be returning the errors from updating nodes/pods instead of interposing itself for admission.

We shouldn't staff work that encourages doing more of the wrong thing.

agreed, and this makes a lot of sense to me.

sftim commented 6 months ago

should karpenter be the component responsible for rejecting the configuration?

I think so, because it's much easier to maintain an important security guarantee this way. It sounds like you think Karpenter is an admission controller; it's not.

elmiko commented 6 months ago

It sounds like you think Karpenter is an admission controller; it's not.

i don't think i do, but i am trying to reconcile why the rules for the unregistered labels are encoded/validated in karpenter instead of deferring to the apiserver to return an error. (perhaps i am not understanding this flow properly).

sftim commented 6 months ago

i don't think i do, but i am trying to reconcile why the rules for the unregistered labels are encoded/validated in karpenter instead of deferring to the apiserver to return an error. (perhaps i am not understanding this flow properly).

The API server's restrictions (from the node restriction admission controller) apply exclusively to label changes made by nodes. Karpenter is not a node, so those restrictions wouldn't be enforced.

elmiko commented 6 months ago

exclusively to label changes made by nodes

that's what i was missing, thank you!

engedaam commented 6 months ago

/remove-label needs-triage

k8s-ci-robot commented 6 months ago

@engedaam: The label(s) /remove-label needs-triage cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to [this](https://github.com/kubernetes-sigs/karpenter/issues/1046#issuecomment-2016610529): >/remove-label needs-triage Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
engedaam commented 6 months ago

/triage accepted

elmiko commented 6 months ago

happy to see this considered for inclusion, but given the conversation between @sftim and myself, i'm not sure this is a good idea to carry forward. it seems like a possible point for a security escalation.

sftim commented 2 months ago

I think we should ask the OpenShift folks to consider not using the wrong label, and leave it at that. OpenShift is entirely welcome to use OpenShift labels for node role, it just shouldn't bypass K8s processes for registering labels.

sftim commented 2 months ago

For people using Karpenter who want to work around this, they could use a different label (eg internal-node-role: worker) and a contrib controller that copies internal-node-role to whatever other label they want set. But we shouldn't make that contrib code part of Kubernetes either, because it'd be furthering a mistake.

diranged commented 2 months ago

So given that Karpenter already applies Taints to Node objects upon creation, I can't see why it would be any different to allow Karpenter to apply labels (including node-role.kubernetes.io/<role>).

sftim commented 2 months ago

Kubernetes enforces the rules here; a node autoscaler, as a client of the Kubernetes API, isn't able by itself to bypass the security rule.Message ID: @.***>

diranged commented 2 months ago

Kubernetes enforces the rules here; a node autoscaler, as a client of the Kubernetes API, isn't able by itself to bypass the security rule.Message ID: @.***> …

I don't believe that is true at all. For example, the Spot.io Ocean Cluster Controller absolutely labels our nodes with the node-role label for us on startup. The only rule here is that the node itself is unable to apply that label.

sftim commented 2 months ago

The label node-role is not reserved by Kubernetes (and as a cluster-private label, never would be).

There is no restriction on labelling nodes with private labels, only on adding registered public labels that belong to Kubernetes.

Message ID: @.***>

sftim commented 2 months ago

Should node autoscalers try to set unofficial labels in the Kubernetes namespace? It's possible to code up, but I (still) don't recommend it; in fact, I actively oppose making that change to Karpenter. See https://github.com/kubernetes-sigs/karpenter/issues/1046#issuecomment-2237753906

Message ID: @.***>

diranged commented 2 months ago

Should node autoscalers try to set unofficial labels in the Kubernetes namespace? It's possible to code up, but I (still) don't recommend it; in fact, I actively oppose making that change to Karpenter

I should fix my comment - I want to use the kubernetes.io/role=xxx label - specifically defined here https://github.com/kubernetes/kubernetes/blob/v1.30.3/staging/src/k8s.io/kubectl/pkg/describe/interface.go#L33-L34 as:

// NodeLabelRole specifies the role of a node
NodeLabelRole = "kubernetes.io/role"

Can you elaborate on why you oppose Karpenter managing this label? I can't find anything in the Kubernetes code or documentation that indicates we should't set this label.

elmiko commented 2 months ago

I think we should ask the OpenShift folks to consider not using the wrong label, and leave it at that. OpenShift is entirely welcome to use OpenShift labels for node role, it just shouldn't bypass K8s processes for registering labels.

i think this is sound advice and something i have advocated for internally at red hat. i only brought up openshift in my issue as it pertains to my specific problem here. i am not attempting to suggest that we need to change kubernetes core behaviors based on my needs, i only added it to help with context.

i appreciate the extended discussions from @sftim and the explanation for the security-related part of this issue.

my confusion is more around karpenter attempting to block everything in the kuberenetes.io prefixed labels, i think this is similar to what @diranged is also pointing out. on this point, i'm fine to accept the community's decision on what karpenter should or should not allow, i was mostly looking to have an open discussion about it.

sftim commented 2 months ago

Maybe this is a missing detail: Karpenter creates nodes that have labels by, AIUI, creating nodes that will run a kubelet that then self-labels.

We could enhance Karpenter to also use a different mechanism, setting labels directly on Nodes based on the owning NodeClaim. I don't see a good case for adding complexity here, but a case could be made.

Message ID: @.***>

diranged commented 1 month ago

creating nodes that will run a kubelet that then self-labels

@sftim, So that is true - if you are using an amiFamily like Bottlerocket where Karpenter creates/merges the userData... however, we actually were running in Custom mode, where Karpenter is not mutating or managing the userData at all, and it still sets the labels on the nodes. Karpenter is (I think) pre-creating the Node object and populating the labels that way.. which would allow it to set the kubernetes.io/role label just fine.