k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.62k stars 2.24k forks source link

Pods are going into pending state after upgrading from v1.26.12-k3s1 to v1.27.11-k3s1 and v1.28.5-k3s1 (Issue is quite random) #10043

Closed ujala-singh closed 2 weeks ago

ujala-singh commented 2 weeks ago

Environmental Info: K3s Version: k3s: v1.27.11-k3s1 and v1.28.5-k3s1 (Tried on both)

Node(s) CPU architecture, OS, and Version: Linux

Cluster Configuration: I am running k3s servers on k8s Host clusters as multi tenant.

Describe the bug: We were running v1.26.12-k3s1 for quite sometime and recently we have upgraded out host clusters as well as k3s to v1.27.9 and v1.27.11-k3s1 respectively. After that I have started facing the pods being stuck into pending state. I have checked the k3s log controller is able to crate the pods successfully but pod goes into pending state.

Controller nginx-deployment-5bc8fcb6c7 created pod nginx-deployment-5bc8fcb6c7-tjxxr
Event occurred  {"component": "k3s", "location": "event.go:307", "object": "default/nginx-deployment-5bc8fcb6c7", "fieldPath": "", "kind": "ReplicaSet", "apiVersion": "apps/v1", "type": "Normal", "reason": "SuccessfulCreate", "message": "Created pod: nginx-deployment-5bc8fcb6c7-tjxxr"}

There is one scenario that is quite weird that I am facing currently. Say I am running 3 replicas of nginx-deployment and when the issue occurs I tried to scale the replicas to 5 the newer 2 pods go into pending state. Also at the same time I deleted one of the older nginx pod and described the svc, in svc endpoints it was still showing the older 3 pod IPs.

$ kubectl get pods -owide | grep -i 'nginx'
nginx-deployment-5bc8fcb6c7-2l4l4                                    1/1     Running            0               5h21m   10.231.108.23    aks-azpmt2sp-83987392-vmss00001o   <none>           <none>
nginx-deployment-5bc8fcb6c7-m557n                                    1/1     Running            0               5h21m   10.231.108.180   aks-azpmt2sp-83987392-vmss00001o   <none>           <none>
nginx-deployment-5bc8fcb6c7-tjxxr                                    1/1     Running            0               5h21m   10.231.108.240   aks-azpmt2sp-83987392-vmss00001o   <none>           <none>
nginx-deployment-5bc8fcb6c7-85n56                                    0/1     Pending            0               26s     <none>      <none>      <none>  <none>
nginx-deployment-5bc8fcb6c7-k86h8                                    0/1     Pending            0               26s     <none>      <none>      <none>    <none>
$ kubectl describe svc nginx-service
Name:              nginx-service
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          app=nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.0.184.124
IPs:               10.0.184.124
Port:              <unset>  80/TCP
TargetPort:        80/TCP
Endpoints:         10.231.108.23:80,10.231.108.180:80,10.231.108.240:80
Session Affinity:  None
Events: <none>

Node Conditions

Conditions:
  Type                          Status  LastHeartbeatTime                 LastTransitionTime                Reason                          Message
  ----                          ------  -----------------                 ------------------                ------                          -------
  FrequentUnregisterNetDevice   False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   NoFrequentUnregisterNetDevice   node is functioning properly
  FrequentKubeletRestart        False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   NoFrequentKubeletRestart        kubelet is functioning properly
  ReadonlyFilesystem            False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   FilesystemIsNotReadOnly         Filesystem is not read-only
  VMEventScheduled              False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:23:19 +0530   NoVMEventScheduled              VM has no scheduled event
  ContainerRuntimeProblem       False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   ContainerRuntimeIsUp            container runtime service is up
  FrequentDockerRestart         False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   NoFrequentDockerRestart         docker is functioning properly
  FrequentContainerdRestart     False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   NoFrequentContainerdRestart     containerd is functioning properly
  KernelDeadlock                False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   KernelHasNoDeadlock             kernel has no deadlock
  KubeletProblem                False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   KubeletIsUp                     kubelet service is up
  FilesystemCorruptionProblem   False   Sat, 27 Apr 2024 23:23:20 +0530   Sat, 27 Apr 2024 23:22:44 +0530   FilesystemIsOK                  Filesystem is healthy
  MemoryPressure                False   Sat, 27 Apr 2024 23:23:01 +0530   Sat, 27 Apr 2024 23:22:31 +0530   KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure                  False   Sat, 27 Apr 2024 23:23:01 +0530   Sat, 27 Apr 2024 23:22:31 +0530   KubeletHasNoDiskPressure        kubelet has no disk pressure
  PIDPressure                   False   Sat, 27 Apr 2024 23:23:01 +0530   Sat, 27 Apr 2024 23:22:31 +0530   KubeletHasSufficientPID         kubelet has sufficient PID available
  Ready                         True    Sat, 27 Apr 2024 23:23:01 +0530   Sat, 27 Apr 2024 23:22:31 +0530   KubeletReady                    kubelet is posting ready status. AppArmor enabled

Steps To Reproduce: It's happening randomly. Not sure, if its reproducible.

Expected behavior: Pod scheduling should happen without any issues i.e. pods should not suck on pending state.

Actual behavior: Pods are being stuck on Pending state and it's happening randomly.