Cluster Autoscaler not Respecting TopologySpread maxSkew=1 on Scale Down

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: amazonaws.com/cluster-autoscaler:v1.28.0

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Server Version: v1.28.9-eks-036c24b

What environment is this in?:

AWS EKS

What did you expect to happen?:

Expected the cluster-autoscaler to recognize it could scale down a node, and move a coredns pod to another node, while still honoring maxSkew=1.

Current configuration has coredns Deployment of 5 replicas, with TopologySpreadConstraints where topologyKey = "kubernetes.io/hostname", labelSelector = {"matchLabels":{"k8s-app":"kube-dns"}}, whenUnsatisfiable = "DoNotSchedule", maxSkew = 1.

If we currently have 5 nodes and 5 coredns pods like:

1 1 1 1 1

Then after some time, cluster autoscaler determines we no longer need 5 nodes for workload and we can scale down to 4 to save money. What I want to happen is something like:

1 1 1 1 1 -----> 2 1 1 1

This should still be a valid configuration for maxSkew=1.

What happened instead?:

During cluster-autoscaler scale-down simulation (in exact scenario described above), the cluster-autoscaler logs show the below failure. It's unable to scale-down the node even though maxSkew=1 will still be honored after this node is deleted. I'm guessing the cluster-autoscaler is including the node it wants to remove while calculating skew, and thus skew = 2 - 0 (global minimum since node to be deleted will have no coredns pods) = 2 > 1 = maxSkew. Therefore it claims it can't put pod on any node, and is treating topologySpreadConstraint like a podAntiAffinity rule.

19:56:09.838749       1 cluster.go:155] ip-10-177-149-54.ec2.internal for removal
19:56:09.839185       1 klogx.go:87] failed to find place for kube-system/coredns-568: cannot put pod coredns-568 on any node
19:56:09.839209       1 cluster.go:175] node ip-10-177-149-54.ec2.internal is not suitable for removal: can reschedule only 0 out of 1 pods

When I increase maxSkew = 2, cluster-autoscaler is able to scale down the unneeded nodes, and honors the maxSkew=2. Issue only seems to occur when maxSkew = 1. In other circumstances, like no TopologySpreadConstraints, cluster-autoscaler is also able to move coredns pods to other nodes and scale down.

How to reproduce it (as minimally and precisely as possible):

Scale up Nodes to above normal amount
Have a Deployment with replicas >= nodeCount and topologySpreadConstraints similar to config below, making sure maxSkew=1.
Kill whatever done to scale up Nodes (if used) and watch cluster-autoscaler try but fail to scale down unneeded nodes.

Anything else we need to know?:

Deployment Config (labelSelector specific for coredns)


  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          k8s-app: kube-dns

kubernetes / autoscaler

Cluster Autoscaler not Respecting TopologySpread maxSkew=1 on Scale Down #6984