Closed kwohlfahrt closed 1 month ago
I would like to be able to configure the autoscaler to not assume that taints on running nodes will apply to newly created nodes in the ASG.
This is possible with --ignore-taint
flag.
Specifies a taint to ignore in node templates when considering to scale a node group
CA maintains a template of what an upcoming node will look like (node template
) for every ASG. As long as we tell CA to ignore a certain taint in this template node, it won't consider the taint when scaling up.
For the over-arching problem of how do I safely rollout a new AMI without causing disruptions for customers
might be a problem well suited for cluster-api but I wonder if we can also do something here e.g., don't let CA scale down a node unless a certain condition is met for example.
I would like to be able to configure the autoscaler to not assume that taints on running nodes will apply to newly created nodes in the ASG.
This is possible with
--ignore-taint
flag.Specifies a taint to ignore in node templates when considering to scale a node group
CA maintains a template of what an upcoming node will look like (
node template
) for every ASG. As long as we tell CA to ignore a certain taint in this template node, it won't consider the taint when scaling up.
Maybe I'm just confused about the documentation then - I thought --ignore-taint
was to ignore taints in the ASG tags like k8s.io/cluster-autoscaler/node-template/taint/<taint>
, but you're saying it will also stop CA from adding taints on running nodes to the new node template?
Anyway, I'll test this flag (probably next week though), and see if it helps for an update rollout.
I wonder if we can also do something here e.g., don't let CA scale down a node unless a certain condition is met for example.
I think the scale-down logic is working OK, it's the lack of scale-up that was causing issues for me.
I thought --ignore-taint was to ignore taints in the ASG tags like k8s.io/cluster-autoscaler/node-template/taint/
, but you're saying it will also stop CA from adding taints on running nodes to the new node template?
ASG tags are used to create a node template from scratch (ref1, ref2). This happens when no node template is present i.e., when CA has just started. Once CA scales up, a node for the ASG will be running in the cluster. This node will then be used as the node template (since it will have the taints from the ASG + anything else).
--ignore-taints
scrubs the taints from the node template (whether it was created from scratch OR whether it was based on an existing node in the cluster)
Think of nodeGroup
as ASG. We get node template on line 41 and call SanitizeNode
on line 48 which internally calls SanitizeTaints
.
https://github.com/kubernetes/autoscaler/blob/e1b03fac9958791790bfc18eeba9fab5cac0ccc1/cluster-autoscaler/core/utils/utils.go#L119
taintConfig
you see above is passed to SanitizeTaints
so that the node template is scrubbed off the taints specified in the --ignore-taint
flag.
P.S.: We don't have a FAQ around --ignore-taint
. We need better documentation around this.
OK, it's been a long time, but I've just tested that the --ignore-taints
flag does not do what I want. If I set this, I see logs like:
I1220 16:08:54.934581 1 taints.go:384] Overriding status of node i-0efa87576c37e0811, which seems to have ignored taint "charmtx.com/maintenance"
I1220 16:08:55.087296 1 klogx.go:87] Pod research/gpu-test-d694757df-7lg97 can be moved to template-node-for-k8s-kai-cluster-t3a.medium-eu-central-1b-e5fee97-4454553931225414682-upcoming-1
The cluster autoscaler seems to be completely ignoring the taint, and assuming that my pod can be scheduled to nodes that have this taint.
This is not what I want, I only want the autoscaler to not assume that new nodes will have the same taints as existing nodes, the autoscaler should only take the taints for new nodes from the ASG tags.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Which component are you using?: Cluster autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
I would like to upgrade the base OS image in my cluster. The users of this cluster have some interactive workloads (e.g. Jupyter notebooks) running in the cluster, that I can't easily kill - I have to wait for the users to terminate them on their own time, which may be a few days. My previous attempt was as follows:
NoSchedule
, to ensure no new pods are scheduled on nodes with an old AMIDuring step (4), any newly created replacement pods are unschedulable on existing nodes, due to the taint. I expected the CA to then scale up the ASG to make room for the new pods and scale in the old, tainted nodes once they were empty.
Unfortunately, this didn't happen. I don't have the exact logs anymore, but the error seemed to be that the CA was detecting that the existing nodes of the ASG were tainted, and assumed that any newly created nodes from the same ASG will also be tainted, and therefore did not create any new nodes. Existing old nodes were scaled down correctly when empty.
Describe the solution you'd like.:
I would like to be able to configure the autoscaler to not assume that taints on running nodes will apply to newly created nodes in the ASG. The autoscaler should consider only the taints in the ASG tags (e.g.
k8s.io/cluster-autoscaler/node-template/taint/<taint>
), and assume any freshly created nodes will match that spec.I think this would solve the problem, as CA would then be able to create new nodes for the pending workloads that can't run on the existing tainted nodes.
Describe any alternative solutions you've considered.:
Additional context.:
Discussion in Slack with @vadasambar: https://kubernetes.slack.com/archives/C09R1LV8S/p1691051051375539