kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.98k stars 3.94k forks source link

[Magnum] Rapid scaling down of nodes in the same nodegroup fails #6213

Open pawcykca opened 11 months ago

pawcykca commented 11 months ago

Which component are you using?: cluster-autoscaler

What version of the component are you using?: cluster-autoscaler:v1.27.1

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.13", GitCommit:"49433308be5b958856b6949df02b716e0a7cf0a3", GitTreeState:"clean", BuildDate:"2023-04-12T12:08:36Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: Openstack with Magnum

What did you expect to happen?:

Rapid scaling down of few nodes from the same nodegroup should be performed (few solutions):

What happened instead?:

When Cluster Autoscaler try to scale down too fast this same Nodegroup - performs scale down operation every 5-15 seconds on Nodes in this nodegroup - then previous scale down operation (Openstack Heat's stack update) is canceled by Openstack Heat. This mainly applies for scaling of default-worker nodegroup which share this same Heat stack as default-master nodegroup, because update of this shared stack (scale operation) checks all resources in default-master nodegroup and then perform update of resources in default-worker nodegroup.

How to reproduce it (as minimally and precisely as possible):

  1. Create K8S cluster using Openstack Magnum with enabled Cluster Autoscaler component
  2. Create below deployment
    Set request/limit so that only one Pod fits on one node
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: autoscale-app
    spec:
      selector:
        matchLabels:
          app.kubernetes.io/instance: autoscale-app
      replicas: 1
      template:
        metadata:
          labels:
            app.kubernetes.io/instance: autoscale-app
            app.kubernetes.io/name: autoscale-app
        spec:
          nodeSelector:
            magnum.openstack.org/nodegroup: "default-worker"
          containers:
          - name: autoscale-app
            image: registry.k8s.io/hpa-example
            ports:
            - containerPort: 80
            resources:
              limits:
                cpu: 1000m
              requests:
                cpu: 1000m
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app.kubernetes.io/name
                    operator: In
                    values:
                    - cluster-autoscaler-hpa-test
                topologyKey: "kubernetes.io/hostname"       
    
  3. Scale deployment to 4 replicas and wait for scale up of default-worker nodegroup and all Pods in Running status
  4. Scale deployment to 1 replica and wait for scale down of default-worker nodegroup
  5. Review Openstack Heat events for failed operations
    # get stack name
    openstack stack list
    # view last 20 events for stack
    openstack stack event list STACK_NAME | tail -n 20

Anything else we need to know?:

Cluster Autoscaler configuration
        - ./cluster-autoscaler
        - --alsologtostderr
        - --cloud-provider=magnum
        - --cluster-name=fb9b32ba-37be-4292-b667-adef4b9c400f
        - --cloud-config=/config/cloud-config
        - --node-group-auto-discovery=magnum:role=worker,autoscale
        - --scan-interval=15s
        - --scale-down-unneeded-time=5m
        - --scale-down-delay-after-failure=3m
        - --scale-down-delay-after-add=5m
        - --unremovable-node-recheck-timeout=1m
        - --node-delete-delay-after-taint=15s
        - --balance-similar-node-groups=true
        - --emit-per-nodegroup-metrics=true
        - --balancing-ignore-label=magnum.openstack.org/nodegroup
        - --balancing-ignore-label=topology.cinder.csi.openstack.org/zone
        - --balancing-ignore-label=topology.manila.csi.openstack.org/zone
Cluster Autoscaler logs
Removing two nodes from this same nodegorup one by one, not in batch (one operation)
I1020 07:18:46.158211       1 nodes.go:126] autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403 was unneeded for 5m19.923973297s
I1020 07:18:46.158233       1 nodes.go:126] autoscaler-test1-3809376-mgbnnje2xnn6-node-1417 was unneeded for 5m19.923973297s
I1020 07:18:46.158252       1 nodes.go:126] autoscaler-test1-3809376-mgbnnje2xnn6-node-1418 was unneeded for 5m4.853269839s 
I1020 07:18:46.190962       1 taints.go:162] Successfully added ToBeDeletedTaint on node autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403
I1020 07:18:46.214380       1 taints.go:162] Successfully added ToBeDeletedTaint on node autoscaler-test1-3809376-mgbnnje2xnn6-node-1417
I1020 07:18:46.276276       1 taints.go:162] Successfully added ToBeDeletedTaint on node autoscaler-test1-3809376-mgbnnje2xnn6-node-1418
I1020 07:18:46.276496       1 actuator.go:160] Scale-down: removing empty node "autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403"
I1020 07:18:46.276999       1 actuator.go:160] Scale-down: removing empty node "autoscaler-test1-3809376-mgbnnje2xnn6-node-1417"
I1020 07:18:46.277139       1 actuator.go:160] Scale-down: removing empty node "autoscaler-test1-3809376-mgbnnje2xnn6-node-1418"
I1020 07:18:46.277715       1 actuator.go:243] Scale-down: waiting 15s before trying to delete nodes
I1020 07:19:01.279341       1 magnum_nodegroup.go:102] Deleting nodes: [autoscaler-test1-3809376-mgbnnje2xnn6-node-1418]
I1020 07:19:01.279408       1 magnum_manager_impl.go:387] manager deleting node: autoscaler-test1-3809376-mgbnnje2xnn6-node-1418
I1020 07:19:01.279414       1 magnum_manager_impl.go:397] resizeOpts: node_count=2, remove=[c15c373a-1687-4b6f-bdd6-4f953789c398]
I1020 07:19:07.868626       1 magnum_nodegroup.go:102] Deleting nodes: [autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403]
I1020 07:19:07.868705       1 magnum_manager_impl.go:387] manager deleting node: autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403
I1020 07:19:07.868713       1 magnum_manager_impl.go:397] resizeOpts: node_count=1, remove=[9a855dfc-604e-4096-895f-0545c913aac0]
I1020 07:19:14.096369       1 magnum_nodegroup.go:102] Deleting nodes: [autoscaler-test1-3809376-mgbnnje2xnn6-node-1417]
I1020 07:19:14.096450       1 magnum_manager_impl.go:387] manager deleting node: autoscaler-test1-3809376-mgbnnje2xnn6-node-1417
I1020 07:19:14.096457       1 magnum_manager_impl.go:397] resizeOpts: node_count=1, remove=[7481489f-3381-4281-8ce9-4f29b5156cd7]
Openstack Heat events
2023-10-20 07:19:04Z [autoscaler-test1-3809376-mgbnnje2xnn6]: UPDATE_IN_PROGRESS  Stack UPDATE started
2023-10-20 07:19:09Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_IN_PROGRESS  state changed
2023-10-20 07:19:17Z [autoscaler-test1-3809376-mgbnnje2xnn6]: UPDATE_IN_PROGRESS  Stack UPDATE started
2023-10-20 07:19:19Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_COMPLETE  state changed
2023-10-20 07:19:19Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_IN_PROGRESS  state changed
2023-10-20 07:19:25Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_FAILED  resources.network: Stack UPDATE cancelled
2023-10-20 07:19:25Z [autoscaler-test1-3809376-mgbnnje2xnn6]: UPDATE_FAILED  Resource UPDATE failed: resources.network: Stack UPDATE cancelled
k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pawcykca commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pawcykca commented 3 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 days ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pawcykca commented 3 days ago

/remove-lifecycle stale