kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.07k stars 3.97k forks source link

cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down #6454

Open desek opened 9 months ago

desek commented 9 months ago

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: 1.28.2

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.28.4
Server Version: v1.28.4

Also using CAPZ version: 1.10.8

What environment is this in?:

What did you expect to happen?:

When cluster-autoscaler taints nodes for deletion it will delete them and scale down the MachinePool.

What happened instead?:

cluster-autoscaler can't find the Machines

How to reproduce it (as minimally and precisely as possible):

Assuming you have a running CAPZ cluster:

  1. Create an AzureMachinePool with spec.orchestrationMode set to Flexible
  2. Scale out a deployment that triggers cluster-autoscaler to increase the replica count of the MachinePool
  3. CAPZ will create one AzureMachinePoolMachine resource per required node
  4. Scale in a deployment that trigger cluster-autoscaler to initiate the scale-down process
  5. cluster-autoscaler fails scale-down due not finding Machine resources

Step 5 fails due to VMSS Flex replicas are created as AzureMachinePoolMachine and not Machine.

Anything else we need to know?:

This commit adds the required resources, indexers and conditions in handlers to correctly remove unneeded AzureMachinePoolMachines: https://github.com/LiveArena/kubernetes-autoscaler/commit/b819ed9bf27722146805425ab82ea5f860c990b3

Shubham82 commented 9 months ago

/area provider/azure /area provider/cluster-api

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Shubham82 commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

elmiko commented 2 weeks ago

cc @jackfrancis , you might be interested in this

jackfrancis commented 2 weeks ago

@tallaxes @comtalyst is the AKS autoscaler currently delivering support for VMSS Flex in CA-enabled node pools?

cc @willie-yao @nojnhuh

Shubham82 commented 2 weeks ago

/remove-lifecycle rotten