Setting maxUnavailable kills pods from different nodes at the same time

math3vz commented 3 months ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. 1.27.3

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. 1.26.5

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Create an instance group with minSize: 2, maxSize: 2 and maxUnavailable: 1.
Create deployment with two replicas and force affinity to run on different nodes in the ig.
Change something like rootVolumeSize and run rolling-update.

5. What happened after the commands executed?

Kops detaches 1 node (node A) from ASG
Kops awaits a new ASG node (C) to join the cluster and get healthy
Kops evicts pods running on nodes A and B at the same time
Pod A starts running on node C
Pod B stays in a pending state until a new node (D) joins the cluster

Logs: i-0c3b3448a4ae12e1c is node A i-00a2a7024e57bb5f8 is node B i-05ea3b6651f6467c0 is node C default/test-57c5db579d-cz5c9 pod is running in node A default/test-57c5db579d-sbx7g pod is running in node B

(base) ➜  ~ kops-1.27.3 rolling-update cluster --name k8s-test-02.my-tld --yes
Detected single-control-plane cluster; won't detach before draining
NAME            STATUS      NEEDUPDATE  READY   MIN TARGET  MAX NODES
master-us-east-1a-1 Ready       0       1   1   1   1   1
test            NeedsUpdate 2       0   2   2   2   2
I0322 11:13:39.217785   27816 instancegroups.go:501] Validating the cluster.
I0322 11:13:42.292254   27816 instancegroups.go:537] Cluster validated.
I0322 11:13:42.292469   27816 instancegroups.go:342] Tainting 2 nodes in "test" instancegroup.
I0322 11:13:42.601527   27816 instancegroups.go:602] Detaching instance "i-0c3b3448a4ae12e1c", node "i-0c3b3448a4ae12e1c", in group "test.k8s-test-02.my-tld".
I0322 11:13:43.394529   27816 instancegroups.go:203] waiting for 15s after detaching instance
I0322 11:13:58.396383   27816 instancegroups.go:501] Validating the cluster.
...
I0322 11:16:54.258561   27816 instancegroups.go:560] Cluster did not pass validation, will retry in "30s": machine "i-05ea3b6651f6467c0" has not yet joined cluster, system-node-critical pod "calico-node-qrd8s" is pending, system-node-critical pod "ebs-csi-node-bnpw8" is pending, system-node-critical pod "kube-proxy-i-05ea3b6651f6467c0" is pending, system-node-critical pod "node-problem-detector-dx8qx" is pending.
I0322 11:17:30.338670   27816 instancegroups.go:540] Cluster validated; revalidating in 10s to make sure it does not flap.
I0322 11:17:44.110497   27816 instancegroups.go:537] Cluster validated.
I0322 11:17:44.111576   27816 instancegroups.go:431] Draining the node: "i-00a2a7024e57bb5f8".
I0322 11:17:44.111584   27816 instancegroups.go:431] Draining the node: "i-0c3b3448a4ae12e1c".
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-s2xm4, kube-system/ebs-csi-node-nrhnc, kube-system/node-problem-detector-44ntz
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-hqrgz, kube-system/ebs-csi-node-wkhph, kube-system/node-problem-detector-fhrvm
evicting pod kube-system/calico-typha-7b67f47cf4-x4vhf
evicting pod kube-system/cluster-autoscaler-677b59697d-x78vr
evicting pod kube-system/metrics-server-7f46fdc79c-gph5v
evicting pod kube-system/pod-identity-webhook-8b88fdcd9-mwfcx
evicting pod default/test-57c5db579d-sbx7g
evicting pod kube-system/cluster-autoscaler-677b59697d-c7b5h
evicting pod kube-system/calico-typha-7b67f47cf4-cp44r
evicting pod default/test-57c5db579d-cz5c9
I0322 11:18:17.220535   27816 instancegroups.go:708] Waiting for 5s for pods to stabilize after draining.
I0322 11:18:17.300659   27816 instancegroups.go:708] Waiting for 5s for pods to stabilize after draining.
I0322 11:18:22.225207   27816 instancegroups.go:625] Stopping instance "i-00a2a7024e57bb5f8", node "i-00a2a7024e57bb5f8", in group "test.k8s-test-02.sre-dev.habitat.zone" (this may take a while).
I0322 11:18:22.301825   27816 instancegroups.go:625] Stopping instance "i-0c3b3448a4ae12e1c", node "i-0c3b3448a4ae12e1c", in group "test.k8s-test-02.sre-dev.habitat.zone" (this may take a while).
I0322 11:18:23.127212   27816 instancegroups.go:467] waiting for 15s after terminating instance
I0322 11:18:37.720373   27816 instancegroups.go:501] Validating the cluster.
I0322 11:18:41.649081   27816 instancegroups.go:560] Cluster did not pass validation, will retry in "30s": InstanceGroup "test" did not have enough nodes 1 vs 2, system-node-critical pod "calico-node-d8td6" is pending, system-node-critical pod "ebs-csi-node-22m6c" is pending, system-node-critical pod "kube-proxy-i-00a2a7024e57bb5f8" is not ready (kube-proxy), system-node-critical pod "kube-proxy-i-0c3b3448a4ae12e1c" is not ready (kube-proxy), system-node-critical pod "node-problem-detector-stkz8" is pending.

6. What did you expect to happen?

Kops would only evict pods from the specific node from which they were detached (A), not A and B at the same time.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: k8s-test-02.my-tld
  name: test
spec:
  rollingUpdate:
    maxUnavailable: 1
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/k8s-test-02.my-tld: ""
  machineType: m5.2xlarge
  maxSize: 2
  minSize: 2
  nodeLabels:
    kops.k8s.io/instancegroup: test
  role: Node
  rootVolumeSize: 201
  rootVolumeType: gp3
  rootVolumeEncryption: true
  subnets:
  - node-us-east-1b
  - node-us-east-1a

I have no rollingUpdate configuration in my cluster.yaml

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here. I will upload later if needed.

9. Anything else do we need to know?

If I set maxUnavailable: 0, only pods from dateched nodes are evicted. Is this the expected behavior?

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

math3vz commented 2 weeks ago

/remove-lifecycle stale

kubernetes / kops

Setting maxUnavailable kills pods from different nodes at the same time #16417