aws-node-termination-handler pod is stuck in pending right after "kops rolling-update cluster --yes"

stl-victor-sudakov commented 2 weeks ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. 1.30.1

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. v1.29.3

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue? kops upgrade cluster --name XXX --kubernetes-version 1.29.9 --yes kops --name XXX update cluster --yes --admin kops --name XXX rolling-update cluster --yes

5. What happened after the commands executed? Cluster did not pass validation at the very beginning of the upgrade procedure:

$ kops rolling-update cluster --yes --name XXX
Detected single-control-plane cluster; won't detach before draining
NAME                            STATUS          NEEDUPDATE      READY   MIN     TARGET  MAX     NODES
control-plane-us-west-2c        NeedsUpdate     1               0       1       1       1       1
nodes-us-west-2c                NeedsUpdate     4               0       4       4       4       4
I1002 15:03:05.336312   37988 instancegroups.go:507] Validating the cluster.
I1002 15:03:29.806323   37988 instancegroups.go:566] Cluster did not pass validation, will retry in "30s": system-cluster-critical pod "aws-node-termination-handler-577f866468-mmlx7" is pending.
I1002 15:04:22.511826   37988 instancegroups.go:566] Cluster did not pass validation, will retry in "30s": system-cluster-critical pod "aws-node-termination-handler-577f866468-mmlx7" is pending.
[...]

002 15:18:58.830547   37988 instancegroups.go:563] Cluster did not pass validation within deadline: system-cluster-critical pod "aws-node-termination-handler-577f866468-mmlx7" is pending.
E1002 15:18:58.830585   37988 instancegroups.go:512] Cluster did not validate within 15m0s
Error: control-plane node not healthy after update, stopping rolling-update: "error validating cluster: cluster did not validate within a duration of \"15m0s\""

When I looked up why the pod was pending, I found the following in "describe pod aws-node-termination-handler-577f866468-mmlx7":

0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 Preemption is not helpful for scheduling.

There is another aws-node-termination-handler- pod running at the moment (the old one):

$ kubectl -n kube-system get pods -l k8s-app=aws-node-termination-handler
NAME                                            READY   STATUS    RESTARTS          AGE
aws-node-termination-handler-577f866468-mmlx7   0/1     Pending   0                 69m
aws-node-termination-handler-6c9c8d7948-fxsrl   1/1     Running   1338 (4h1m ago)   133d

6. What did you expect to happen?

I expected the cluster to be upgraded go Kubernetes 1.29.9

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-07-05T02:16:44Z"
  generation: 9
  name: YYYY
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://XXXX/YYYY
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-west-2c
      name: c
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-west-2c
      name: c
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - X.X.X.X/24
    kubernetesVersion: 1.29.9
  masterPublicName: api.YYYY
  networkCIDR: 172.22.0.0/16
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - X.X.X.X/24
  subnets:
  - cidr: 172.22.32.0/19
    name: us-west-2c
    type: Public
    zone: us-west-2c
  topology:
    dns:
      type: Public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-07-05T02:16:48Z"
  generation: 5
  labels:
    kops.k8s.io/cluster: YYYY
  name: control-plane-us-west-2c
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240607
  instanceMetadata:
    httpPutResponseHopLimit: 3
    httpTokens: required
  machineType: t3a.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-west-2c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-07-05T02:16:49Z"
  generation: 7
  labels:
    kops.k8s.io/cluster: YYYY
  name: nodes-us-west-2c
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240607
  instanceMetadata:
    httpPutResponseHopLimit: 1
    httpTokens: required
  machineType: t3a.xlarge
  maxSize: 4
  minSize: 4
  role: Node
  subnets:
  - us-west-2c

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here. Please see above the validation log.

9. Anything else do we need to know? Now I would like to know how to recover from this situation and how to get rid of the aws-node-termination-handler-577f866468-mmlx7 pod which is now left in Pending state.

stl-victor-sudakov commented 2 weeks ago

I have tried killing the running pod, and now I again have one pod running and one pending:

$ kubectl -n kube-system get pods -l k8s-app=aws-node-termination-handler
NAME                                            READY   STATUS    RESTARTS   AGE
aws-node-termination-handler-577f866468-bj4gd   0/1     Pending   0          41h
aws-node-termination-handler-6c9c8d7948-vt7hh   1/1     Running   0          3m30s

nuved commented 2 weeks ago

Hi @stl-victor-sudakov
you should find out what's the reason of pending state by running something like this one , kubectl describe pod aws-node-termination-handler-577f866468-bj4gd -n kube-system . that may happen if for the second pod , scheduler of k8s could not find any place to deploy the service .

Most of times , that means the new controller nodes are not joined to the cluster properly and that's why scheduler could not deploy the service on the target nodes .

stl-victor-sudakov commented 2 weeks ago

@nuved I think I have already posted the error message above but I don't mind repeating, the relevant part of "kubectl -n kube-system describe pod aws-node-termination-handler-577f866468-bj4gd" is

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  13m (x6180 over 2d3h)  default-scheduler  0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 Preemption is not helpful for scheduling.

There is actually only one control node in the cluster. Is there any additional information I could provide?

UPD the complete "describe pod" output can be seen here: https://termbin.com/0sy6 (not to clutter the conversation with excessive output).

nuved commented 2 weeks ago

Well, that means there are not enough nodes. You should make sure if all nodes are up and ready . Kubectl get nodes -o wide I guess one of the controllers has an issue .

On Fri, Oct 4, 2024, 4:15 PM Victor Sudakov @.***> wrote:

@nuved https://github.com/nuved I think I have alredy posted the error message above but I don't mind repeating, the relevant part of "kubectl -n kube-system describe pod aws-node-termination-handler-577f866468-bj4gd" is

Events: Type Reason Age From Message

Warning FailedScheduling 13m (x6180 over 2d3h) default-scheduler 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 Preemption is not helpful for scheduling.

There is actually only one control node in the cluster. Is there any additional information I could provide?

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/kops/issues/16870#issuecomment-2393817882, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCQEOGEDNMJV4VIZVKC2DLZZ2PG3AVCNFSM6AAAAABPHPKE2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJTHAYTOOBYGI . You are receiving this because you were mentioned.Message ID: @.***>

stl-victor-sudakov commented 2 weeks ago

$ kubectl get nodes -o wide
NAME                  STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
i-01dbd1dccc0e30845   Ready    node            91d    v1.29.3   172.22.43.73    35.90.140.78     Ubuntu 22.04.4 LTS   6.5.0-1018-aws   containerd://1.7.16
i-02cf4b0fed779eb54   Ready    control-plane   135d   v1.29.3   172.22.48.131   34.222.92.123    Ubuntu 22.04.4 LTS   6.5.0-1018-aws   containerd://1.7.16
i-05569e161b2556a75   Ready    node            91d    v1.29.3   172.22.35.18    34.213.33.180    Ubuntu 22.04.4 LTS   6.5.0-1018-aws   containerd://1.7.16
i-06c219f4c3404e207   Ready    node            91d    v1.29.3   172.22.56.240   54.203.143.227   Ubuntu 22.04.4 LTS   6.5.0-1018-aws   containerd://1.7.16
i-0d1c604064d671d98   Ready    node            91d    v1.29.3   172.22.61.60    18.237.56.79     Ubuntu 22.04.4 LTS   6.5.0-1018-aws   containerd://1.7.16
$

It is a single-control-plane cluster. Also:

$ kops get instances
Using cluster from kubectl context: devXXXXXXX

ID                      NODE-NAME               STATUS          ROLES                           STATE   INTERNAL-IP     EXTERNAL-IP     INSTANCE-GROUP         MACHINE-TYPE
i-01dbd1dccc0e30845     i-01dbd1dccc0e30845     NeedsUpdate     node                                    172.22.43.73                    nodes-us-west-2c.YYYY                       t3a.xlarge
i-02cf4b0fed779eb54     i-02cf4b0fed779eb54     NeedsUpdate     control-plane, control-plane            172.22.48.131                   control-plane-us-west-2c.masters.YYYY       t3a.medium
i-05569e161b2556a75     i-05569e161b2556a75     NeedsUpdate     node                                    172.22.35.18                    nodes-us-west-2c.YYYY                       t3a.xlarge
i-06c219f4c3404e207     i-06c219f4c3404e207     NeedsUpdate     node                                    172.22.56.240                   nodes-us-west-2c.YYYY                       t3a.xlarge
i-0d1c604064d671d98     i-0d1c604064d671d98     NeedsUpdate     node                                    172.22.61.60                    nodes-us-west-2c.YYYY                       t3a.xlarge
$

stl-victor-sudakov commented 2 weeks ago

There is exactly one instance i-02cf4b0fed779eb54 in the control-plane-us-west-2c.masters.dev2XXXXX AWS autoscaling group, it is healthy according to AWS.

nuved commented 1 week ago

Probably you just need to adjust the replica set manually , set it to 1 . kubelet edit deployment aws-node-termination-handler -n kube-system

I'm not sure how you can change the replica size via kops . but it should be work.

stl-victor-sudakov commented 1 week ago

Manually deleting the replicaset which had contained the old aws-node-termination-handler pod did the trick (the pod was finally replaced), but this should happen automatically and not prevent "kops rolling-update cluster" command from running smoothly.

kubernetes / kops

aws-node-termination-handler pod is stuck in pending right after "kops rolling-update cluster --yes" #16870