kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
287 stars 414 forks source link

handle non-native autoscalers for AzureASOManagedMachinePool #4929

Closed nojnhuh closed 1 week ago

nojnhuh commented 1 week ago

What type of PR is this? /kind bug

What this PR does / why we need it:

Currently, AzureASOManagedMachinePool only handles autoscaled machine pools when the autoscaler is AKS's built-in cluster-autoscaler exposed through the AKS API. This change fixes an issue where using a non-"aks" autoscaler defined in a MachinePool's cluster.x-k8s.io/replicas-managed-by annotation would cause CAPZ/ASO and the autoscaler to fight over the replica count.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Special notes for your reviewer:

TODOs:

Release note:

Fixed a bug causing ASO and non-AKS-native autoscalers to fight over the replica count of an AzureASOManagedMachinePool
nojnhuh commented 1 week ago

/cherry-pick release-1.15 /area managedclusters /assign @willie-yao

@willie-yao This is the diff to the templates I was using to test this:

diff --git a/templates/cluster-template-aks-aso.yaml b/templates/cluster-template-aks-aso.yaml
index 29f77594e..f8dab4ed2 100644
--- a/templates/cluster-template-aks-aso.yaml
+++ b/templates/cluster-template-aks-aso.yaml
@@ -99,6 +99,8 @@ spec:
 apiVersion: cluster.x-k8s.io/v1beta1
 kind: MachinePool
 metadata:
+  annotations:
+    cluster.x-k8s.io/replicas-managed-by: cluster-autoscaler
   name: ${CLUSTER_NAME}-pool1
   namespace: default
 spec:
diff --git a/templates/flavors/aks-aso/cluster-template.yaml b/templates/flavors/aks-aso/cluster-template.yaml
index f71a8258f..5909bfc91 100644
--- a/templates/flavors/aks-aso/cluster-template.yaml
+++ b/templates/flavors/aks-aso/cluster-template.yaml
@@ -97,6 +97,8 @@ apiVersion: cluster.x-k8s.io/v1beta1
 kind: MachinePool
 metadata:
   name: "${CLUSTER_NAME}-pool1"
+  annotations:
+    cluster.x-k8s.io/replicas-managed-by: cluster-autoscaler
 spec:
   clusterName: "${CLUSTER_NAME}"
   replicas: ${WORKER_MACHINE_COUNT:=2}
diff --git a/templates/test/ci/cluster-template-prow-aks-aso.yaml b/templates/test/ci/cluster-template-prow-aks-aso.yaml
index 5eb38cbaf..7be94e6d4 100644
--- a/templates/test/ci/cluster-template-prow-aks-aso.yaml
+++ b/templates/test/ci/cluster-template-prow-aks-aso.yaml
@@ -107,6 +107,8 @@ spec:
 apiVersion: cluster.x-k8s.io/v1beta1
 kind: MachinePool
 metadata:
+  annotations:
+    cluster.x-k8s.io/replicas-managed-by: cluster-autoscaler
   name: ${CLUSTER_NAME}-pool1
   namespace: default
 spec:

If you spin up an aks-aso cluster you should be able to change pool1's replica count from anywhere (except the MachinePool spec) and it should stick even after restarting ASO to trigger a reconciliation.

k8s-infra-cherrypick-robot commented 1 week ago

@nojnhuh: once the present PR merges, I will cherry-pick it on top of release-1.15 in a new PR and assign it to you.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/4929#issuecomment-2176464281): >/cherry-pick release-1.15 >/area managedclusters >/assign @willie-yao > >@willie-yao This is the diff to the templates I was using to test this: >```diff >diff --git a/templates/cluster-template-aks-aso.yaml b/templates/cluster-template-aks-aso.yaml >index 29f77594e..f8dab4ed2 100644 >--- a/templates/cluster-template-aks-aso.yaml >+++ b/templates/cluster-template-aks-aso.yaml >@@ -99,6 +99,8 @@ spec: > apiVersion: cluster.x-k8s.io/v1beta1 > kind: MachinePool > metadata: >+ annotations: >+ cluster.x-k8s.io/replicas-managed-by: cluster-autoscaler > name: ${CLUSTER_NAME}-pool1 > namespace: default > spec: >diff --git a/templates/flavors/aks-aso/cluster-template.yaml b/templates/flavors/aks-aso/cluster-template.yaml >index f71a8258f..5909bfc91 100644 >--- a/templates/flavors/aks-aso/cluster-template.yaml >+++ b/templates/flavors/aks-aso/cluster-template.yaml >@@ -97,6 +97,8 @@ apiVersion: cluster.x-k8s.io/v1beta1 > kind: MachinePool > metadata: > name: "${CLUSTER_NAME}-pool1" >+ annotations: >+ cluster.x-k8s.io/replicas-managed-by: cluster-autoscaler > spec: > clusterName: "${CLUSTER_NAME}" > replicas: ${WORKER_MACHINE_COUNT:=2} >diff --git a/templates/test/ci/cluster-template-prow-aks-aso.yaml b/templates/test/ci/cluster-template-prow-aks-aso.yaml >index 5eb38cbaf..7be94e6d4 100644 >--- a/templates/test/ci/cluster-template-prow-aks-aso.yaml >+++ b/templates/test/ci/cluster-template-prow-aks-aso.yaml >@@ -107,6 +107,8 @@ spec: > apiVersion: cluster.x-k8s.io/v1beta1 > kind: MachinePool > metadata: >+ annotations: >+ cluster.x-k8s.io/replicas-managed-by: cluster-autoscaler > name: ${CLUSTER_NAME}-pool1 > namespace: default > spec: >``` > >If you spin up an `aks-aso` cluster you should be able to change pool1's replica count from anywhere (except the MachinePool spec) and it should stick even after restarting ASO to trigger a reconciliation. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
nojnhuh commented 1 week ago

/cherry-pick release-1.15

k8s-infra-cherrypick-robot commented 1 week ago

@nojnhuh: once the present PR merges, I will cherry-pick it on top of release-1.15 in a new PR and assign it to you.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/4929#issuecomment-2176465100): >/cherry-pick release-1.15 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 62.19%. Comparing base (772d354) to head (92e6ed0). Report is 5 commits behind head on main.

Files Patch % Lines
exp/mutators/azureasomanagedmachinepool.go 75.00% 1 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #4929 +/- ## ======================================= Coverage 62.19% 62.19% ======================================= Files 201 201 Lines 16878 16884 +6 ======================================= + Hits 10497 10501 +4 - Misses 5591 5592 +1 - Partials 790 791 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

k8s-ci-robot commented 1 week ago

LGTM label has been added.

Git tree hash: ad147b24c5edd1580093bad7e44ba3ca2ff17368

k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/OWNERS)~~ [mboersma] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
k8s-infra-cherrypick-robot commented 1 week ago

@nojnhuh: new pull request created: #4935

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/4929#issuecomment-2176465100): >/cherry-pick release-1.15 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.