cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down

desek commented 9 months ago

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: 1.28.2

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Client Version: v1.28.4
Server Version: v1.28.4

Also using CAPZ version: 1.10.8

What environment is this in?:

The CAPI/CAPZ Management cluster is running on Azure AKS.
The workload cluster is running in Azure.
The cluster-autoscaler is running on the Management cluster

What did you expect to happen?:

When cluster-autoscaler taints nodes for deletion it will delete them and scale down the MachinePool.

What happened instead?:

cluster-autoscaler can't find the Machines

How to reproduce it (as minimally and precisely as possible):

Assuming you have a running CAPZ cluster:

Create an AzureMachinePool with spec.orchestrationMode set to Flexible
Scale out a deployment that triggers cluster-autoscaler to increase the replica count of the MachinePool
CAPZ will create one AzureMachinePoolMachine resource per required node
Scale in a deployment that trigger cluster-autoscaler to initiate the scale-down process
cluster-autoscaler fails scale-down due not finding Machine resources

Step 5 fails due to VMSS Flex replicas are created as AzureMachinePoolMachine and not Machine.

Anything else we need to know?:

This commit adds the required resources, indexers and conditions in handlers to correctly remove unneeded AzureMachinePoolMachines: https://github.com/LiveArena/kubernetes-autoscaler/commit/b819ed9bf27722146805425ab82ea5f860c990b3

Shubham82 commented 9 months ago

/area provider/azure /area provider/cluster-api

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Shubham82 commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

elmiko commented 2 weeks ago

cc @jackfrancis , you might be interested in this

jackfrancis commented 2 weeks ago

@tallaxes @comtalyst is the AKS autoscaler currently delivering support for VMSS Flex in CA-enabled node pools?

cc @willie-yao @nojnhuh

Shubham82 commented 2 weeks ago

/remove-lifecycle rotten

kubernetes / autoscaler

cloud provider clusterapi with cloud-provider-azure AzureMachinePools using orchestrationMode=Flexible does not scale down #6454