kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.97k stars 3.94k forks source link

Eviction pods with safe-to-evict: false annotation - scale-down-delay-after-add issue #7269

Open leonelvargas opened 1 week ago

leonelvargas commented 1 week ago

Which component are you using?: Cluster autoscaler

What version of the component are you using?: v1.30.0

Component version: v1.30.0

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.3-eks-2f46c53

What environment is this in?: EKS - AWS

What did you expect to happen?:

If the configured scale-down-delay-after-add time expires, the autoscaler will mark this node and proceed to execute a scale-down if it is not used anymore, unless there are pods in it that have the annotation “safe-to-evict: false”.

What happened instead?:

Once this time expires, if there are new pods in that node, the autoscaler proceeds to drain the node via the cluster API, causing eviction in those pods. Note that all these pods have the annotation “safe-to-evict: false” but they are drained anyway.

How to reproduce it (as minimally and precisely as possible):

Reproducing it is my problem. I can't determine the cause but in our environments if we configure the autoscaler as follows

./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--expander=least-waste
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/{{ .Values.aws.clusterName }}
--balance-similar-node-groups
--skip-nodes-with-system-pods=false
--ignore-daemonsets-utilization=true
--scale-down-delay-after-add=30m
--scale-down-utilization-threshold=0.01

The number of pods with eviction increases. But if we configure the autoscaler by increasing the delay, the eviction is very markedly reduced.

./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--expander=least-waste
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/{{ .Values.aws.clusterName }}
--balance-similar-node-groups
--skip-nodes-with-system-pods=false
--ignore-daemonsets-utilization=true
--scale-down-delay-after-add=4h
--scale-down-utilization-threshold=0.01

Anything else we need to know?:

I attach an analysis I did, because this error appears only if I activate the autoscaler. If the cluster I leave enough fixed nodes for the loads of my pods, at no time Eviction problems occur, hence my concern and why I generate this ticket. Exemplified case: At 15:56 (UTC-3) the node is without any pod with the annotation “safe-to-evict:false” because the pod “5bc..” finishes its work (no eviction occurs). After 4 min the cluster schedules 2 pods with the annotation (“safe-to-evict:false”) but the node already has a “scale-down-delay-after-add” time expired. The strange thing is that after that time the node is drained.

image

Drained pods: image image

Autoscaler Logs: I also attach the autoscaler logs: autoscaler-logs.txt

Pod with the annotation “safe-to-evict:false”. image

Hypothesis

The autoscaler does not take into account the annotation “safe-to-evict:false” when the “scale-down-delay-after-add” expired. I will also add a ticket I saw of similar behaviors due to a problem with this annotation, maybe they are related. https://github.com/kubernetes/autoscaler/issues/7244

Thanks.

adrianmoisey commented 1 week ago

/area cluster-autoscaler