kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.2k stars 6.49k forks source link

Make kubespray authoritative on node taints #11697

Open VannTen opened 1 week ago

VannTen commented 1 week ago

What type of PR is this? /kind feature

What this PR does / why we need it: Currently, we only add/modify taints to nodes (not remove). This mean is users remove taints from their kubespray inventories, they also have to remove them manually from their clusters.

Switch to replacing the entire taints array by patching 'spec.taints'; we do preserve Kubernetes reserved taints (https://kubernetes.io/docs/reference/labels-annotations-taints/).

The string from for providing the annotations is more complicated to manipulate, in kubespray or in users inventory.

Deprecate the string form in favor of reusing the structure of the Kubernetes API. We keep a compatibily layer which parse the string on-the-fly, which we should remove in the N+1 release (N=next relase)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer: Kinda follow-up to #10705 @maxime1907 would love to hear your thoughts

Regarding the nvidia bits. Setting a boolean is marginally easier, but why not just ask users to have the nvidia taints in their group_vars for their nvidia nodes and just scrap those variables ? => and just add that to documentation ? Wdyt ?

Does this PR introduce a user-facing change?:

action required
Using `node_taints` as a string array is deprecated. See roles/kubernetes/node-taint/defaults/main.yml for the new format. 
Kubespray will now be authoritative on node taints, which means taints not defined in kubespray will be removed (except kubernetes.io/k8s.io prefixed taints)
The variable `nvidia_gpu_nodes` is not used anymore to set node taints. Instead, directly [set them in your inventory](docs/manage_your_inventory/dedicated_node_groups.md)
k8s-ci-robot commented 1 week ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/kubespray/blob/master/OWNERS)~~ [VannTen] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
VannTen commented 1 week ago

/ok-to-test

maxime1907 commented 1 week ago

nice finally a true config as code

for the nvidia part i would just remove it and add some documentation to avoid overengineering stuff

VannTen commented 1 week ago

/label tide/merge-method-merge

VannTen commented 1 week ago

Hum, I wonder if we could leverage server-side apply to handle "taints we don't manage" 🤔

VannTen commented 1 week ago

The answer to this last question is "not yet" kubernetes/kubernetes#117142

VannTen commented 1 week ago

/hold

I need to think a bit more about that server-side apply stuff