kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.57k stars 1.31k forks source link

Support taint propagation from MachinePool to the underlying nodes #11175

Open Archisman-Mridha opened 1 month ago

Archisman-Mridha commented 1 month ago

What would you like to be added (User Story)?

It'll be very helpful if we have this feature support :

Declare what taints should be applied to the underlying node(s) in the Machine / MachineDeployment / MachinePool spec.

Detailed Description

Especially, when I'm creating a MachinePool. I want to specify taints in the MachinePool spec, and those taints will be applied to each node managed by that MachinePool. Those nodes will then be reserved for specific types of workloads which have tolerations against those node taints.

This is will help manage node taints in a declarative manner using GitOps.

Anything else you would like to add?

I've come up with a Plan Of Action here : https://github.com/Obmondo/cluster-api/commit/ab19c36069d220318e2e2caa7159a2a700ad9ec8

Label(s) to be applied

/kind feature One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
fabriziopandini commented 1 month ago

If it is only for machine creation, the current solution is to rely on the capabilities of bootstrap providers, but I'm aware of the fact that not all the MP implementations use a bootstrap providers.

If we are looking for a more generic story, that do not apply only on creation, this requires some design, especially for figuring out the interactions between the new "taint propagation" feature and taints defined by bootstrap providers

In order to define next steps, might be worth taking a look at https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation and underlying proposal:

cc @jackfrancis @mboersma @willie-yao

cc @elmiko this discussion might be useful to revive https://github.com/kubernetes-sigs/cluster-api/issues/7685

Archisman-Mridha commented 1 month ago

@fabriziopandini Thanks for the resources you linked. I wasn't aware of the JoinConfiguration.NodeRegistration.Taints field in the KubeadmConfigTemplate custom resource. I can use that to apply taint to newly minted worker nodes (managed by a MachinePool in my case). Currently this is what my company needs.

However there will be a downside - we can't in-place change (add / remove) taints for those worker nodes. And I'm interested in contributing the solution for this. We want the user to be able to specify the taints in the Machine spec. ClusterAPI will then sync those taints between the Machine resource and the corresponding Kubernetes node.

That's what I'm trying to implement here : https://github.com/Obmondo/cluster-api/commit/ab19c36069d220318e2e2caa7159a2a700ad9ec8. Can you please give any opinion on whether I'm going in the right direction or not ?

JoelSpeed commented 1 month ago

However there will be a downside - we can't in-place change (add / remove) taints for those worker nodes

Removing taints becomes problematic if you have the potential for multiple actors to add taints (which is always the case, think about cordoning a node using kubectl).

Typically in cases where we've had fields like this before, you'd omit the removal side of things, and the taints would be added if they are in the spec, and manually removed if removed from spec.

~However, now that we have SSA, in theory we can make this work. If we were to use SSA to apply the taints, then whenever we apply the list of taints from our side, any that we applied (and therefore own according to field managers) that we no longer specify would be removed for us. If they are later re-added by something other, then they would be the field owner now and we wouldn't know that we previously owned it, so, this could be safe if we use SSA~

Edit: Scratch that, taints are atomic 😢

fabriziopandini commented 1 month ago

@JoelSpeed good catch about taints being atomic/the fact that we cannot remove taint automatically.

@Archisman-Mridha

That's what I'm trying to implement here : https://github.com/Obmondo/cluster-api/commit/ab19c36069d220318e2e2caa7159a2a700ad9ec8. Can you please give any opinion on whether I'm going in the right direction or not ?

I don't have bandwidth to research this ATM, but at first sight

Considering all of this, I strongly suggest to do some design work before implementing, identifying API types changes, controller impacted etc.

It could be a comment on this thread for starting, then we can figure out if it is also needed to amend some of the doc/proposals linked above or else