[Magnum] Rapid scaling down of nodes in the same nodegroup fails

pawcykca commented 11 months ago

Which component are you using?: cluster-autoscaler

What version of the component are you using?: cluster-autoscaler:v1.27.1

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.13", GitCommit:"49433308be5b958856b6949df02b716e0a7cf0a3", GitTreeState:"clean", BuildDate:"2023-04-12T12:08:36Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: Openstack with Magnum

What did you expect to happen?:

Rapid scaling down of few nodes from the same nodegroup should be performed (few solutions):

in batch if there are more than 1 node to remove (not one by one as currently) Scale down size of nodegroup from 3 to 1 nodes in one operation, not from 3 to 2 nodes and then from 2 to 1 nodes in two operations
at configurable interval/delay of X seconds by the new parameter Where 0 seconds (default configuration) could be backward compatible
only when nodegroup's Openstack Heat stack's status is *_COMPLETED (to don't cancel/broke previous scale up/down operation) Configurable by the new parameter for a backward compatibility

What happened instead?:

When Cluster Autoscaler try to scale down too fast this same Nodegroup - performs scale down operation every 5-15 seconds on Nodes in this nodegroup - then previous scale down operation (Openstack Heat's stack update) is canceled by Openstack Heat. This mainly applies for scaling of default-worker nodegroup which share this same Heat stack as default-master nodegroup, because update of this shared stack (scale operation) checks all resources in default-master nodegroup and then perform update of resources in default-worker nodegroup.

How to reproduce it (as minimally and precisely as possible):

Create K8S cluster using Openstack Magnum with enabled Cluster Autoscaler component

Create below deployment

Set request/limit so that only one Pod fits on one node

apiVersion: apps/v1
kind: Deployment
metadata:
  name: autoscale-app
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: autoscale-app
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: autoscale-app
        app.kubernetes.io/name: autoscale-app
    spec:
      nodeSelector:
        magnum.openstack.org/nodegroup: "default-worker"
      containers:
      - name: autoscale-app
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 1000m
          requests:
            cpu: 1000m
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - cluster-autoscaler-hpa-test
            topologyKey: "kubernetes.io/hostname"

Scale deployment to 4 replicas and wait for scale up of default-worker nodegroup and all Pods in Running status
Scale deployment to 1 replica and wait for scale down of default-worker nodegroup

Review Openstack Heat events for failed operations

# get stack name
openstack stack list
# view last 20 events for stack
openstack stack event list STACK_NAME | tail -n 20

Anything else we need to know?:

Cluster Autoscaler configuration

        - ./cluster-autoscaler
        - --alsologtostderr
        - --cloud-provider=magnum
        - --cluster-name=fb9b32ba-37be-4292-b667-adef4b9c400f
        - --cloud-config=/config/cloud-config
        - --node-group-auto-discovery=magnum:role=worker,autoscale
        - --scan-interval=15s
        - --scale-down-unneeded-time=5m
        - --scale-down-delay-after-failure=3m
        - --scale-down-delay-after-add=5m
        - --unremovable-node-recheck-timeout=1m
        - --node-delete-delay-after-taint=15s
        - --balance-similar-node-groups=true
        - --emit-per-nodegroup-metrics=true
        - --balancing-ignore-label=magnum.openstack.org/nodegroup
        - --balancing-ignore-label=topology.cinder.csi.openstack.org/zone
        - --balancing-ignore-label=topology.manila.csi.openstack.org/zone

Cluster Autoscaler logs

Removing two nodes from this same nodegorup one by one, not in batch (one operation)

I1020 07:18:46.158211       1 nodes.go:126] autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403 was unneeded for 5m19.923973297s
I1020 07:18:46.158233       1 nodes.go:126] autoscaler-test1-3809376-mgbnnje2xnn6-node-1417 was unneeded for 5m19.923973297s
I1020 07:18:46.158252       1 nodes.go:126] autoscaler-test1-3809376-mgbnnje2xnn6-node-1418 was unneeded for 5m4.853269839s 
I1020 07:18:46.190962       1 taints.go:162] Successfully added ToBeDeletedTaint on node autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403
I1020 07:18:46.214380       1 taints.go:162] Successfully added ToBeDeletedTaint on node autoscaler-test1-3809376-mgbnnje2xnn6-node-1417
I1020 07:18:46.276276       1 taints.go:162] Successfully added ToBeDeletedTaint on node autoscaler-test1-3809376-mgbnnje2xnn6-node-1418
I1020 07:18:46.276496       1 actuator.go:160] Scale-down: removing empty node "autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403"
I1020 07:18:46.276999       1 actuator.go:160] Scale-down: removing empty node "autoscaler-test1-3809376-mgbnnje2xnn6-node-1417"
I1020 07:18:46.277139       1 actuator.go:160] Scale-down: removing empty node "autoscaler-test1-3809376-mgbnnje2xnn6-node-1418"
I1020 07:18:46.277715       1 actuator.go:243] Scale-down: waiting 15s before trying to delete nodes
I1020 07:19:01.279341       1 magnum_nodegroup.go:102] Deleting nodes: [autoscaler-test1-3809376-mgbnnje2xnn6-node-1418]
I1020 07:19:01.279408       1 magnum_manager_impl.go:387] manager deleting node: autoscaler-test1-3809376-mgbnnje2xnn6-node-1418
I1020 07:19:01.279414       1 magnum_manager_impl.go:397] resizeOpts: node_count=2, remove=[c15c373a-1687-4b6f-bdd6-4f953789c398]
I1020 07:19:07.868626       1 magnum_nodegroup.go:102] Deleting nodes: [autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403]
I1020 07:19:07.868705       1 magnum_manager_impl.go:387] manager deleting node: autoscaler-test1-380-ng-az1-d7pwln4ejeuz-node-2403
I1020 07:19:07.868713       1 magnum_manager_impl.go:397] resizeOpts: node_count=1, remove=[9a855dfc-604e-4096-895f-0545c913aac0]
I1020 07:19:14.096369       1 magnum_nodegroup.go:102] Deleting nodes: [autoscaler-test1-3809376-mgbnnje2xnn6-node-1417]
I1020 07:19:14.096450       1 magnum_manager_impl.go:387] manager deleting node: autoscaler-test1-3809376-mgbnnje2xnn6-node-1417
I1020 07:19:14.096457       1 magnum_manager_impl.go:397] resizeOpts: node_count=1, remove=[7481489f-3381-4281-8ce9-4f29b5156cd7]

Openstack Heat events

2023-10-20 07:19:04Z [autoscaler-test1-3809376-mgbnnje2xnn6]: UPDATE_IN_PROGRESS  Stack UPDATE started
2023-10-20 07:19:09Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_IN_PROGRESS  state changed
2023-10-20 07:19:17Z [autoscaler-test1-3809376-mgbnnje2xnn6]: UPDATE_IN_PROGRESS  Stack UPDATE started
2023-10-20 07:19:19Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_COMPLETE  state changed
2023-10-20 07:19:19Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_IN_PROGRESS  state changed
2023-10-20 07:19:25Z [autoscaler-test1-3809376-mgbnnje2xnn6.network]: UPDATE_FAILED  resources.network: Stack UPDATE cancelled
2023-10-20 07:19:25Z [autoscaler-test1-3809376-mgbnnje2xnn6]: UPDATE_FAILED  Resource UPDATE failed: resources.network: Stack UPDATE cancelled

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pawcykca commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pawcykca commented 3 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 days ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pawcykca commented 3 days ago

/remove-lifecycle stale

kubernetes / autoscaler

[Magnum] Rapid scaling down of nodes in the same nodegroup fails #6213