kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.04k stars 3.96k forks source link

scale down issue with scale-down-utilization-threshold at 0 #6791

Open ut0mt8 opened 5 months ago

ut0mt8 commented 5 months ago

Which component are you using?:

cluster-autoscaller

v1.29.0

Component version:

What k8s version are you using (kubectl version)?:

Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.14-eks-b9c9ed7", GitCommit:"7c3f2be51edd9fa5727b6ecc2c3fc3c578aa02ca", GitTreeState:"clean", BuildDate:"2024-03-02T03:46:35Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

in EKS/AWS launch with args like this:

        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=kube-system
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster
        - --balance-similar-node-groups=true
        - --expander=least-waste
        - --ignore-daemonsets-utilization=true
        - --logtostderr=true
        - --scale-down-unneeded-time=5m
        - --scale-down-unready-time=5m
        - --scale-down-utilization-threshold=0 <======
        - --skip-nodes-with-local-storage=false
        - --skip-nodes-with-system-pods=false
        - --stderrthreshold=info
        - --v=4

What did you expect to happen?:

When nodes are empty (meaning no pods from deployment) scale down happening

What happened instead?:

Something prevent nodes to scale down : see this spurious log :

unremovable: memory requested (0% of allocatable) is above the scale-down utilization threshold

on one of the candidate node.

How to reproduce it (as minimally and precisely as possible):

Nothing more to add. Below config should be sufficient.

Anything else we need to know?:

putting 0.01 for scale-down-utilization-threshold seems to works but it's a bit counter intuitive. and what we want actually is that cluster autoscaller dont' care about resource but just scale down empty nodes. I wonder why such a complex heuristics?

leoryu commented 5 months ago

I'm having the same issue as well, this is releated code: https://github.com/kubernetes/autoscaler/blob/3fd892a37b50a885eaceaa9619a1a3e153548dc9/cluster-autoscaler/core/scaledown/eligibility/eligibility.go#L187

Shubham82 commented 4 months ago

/area provider/aws /area cluster-autoscaler

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Shubham82 commented 1 month ago

/remove-lifecycle stale