openebs-archive / cstor-operators

Collection of OpenEBS cStor Data Engine Operators
https://openebs.io
Apache License 2.0
94 stars 69 forks source link

Rescheduled application pod remains in ContainerCreating state forever after powering off the node #451

Closed jianghushinian closed 1 year ago

jianghushinian commented 1 year ago

Same issue as https://github.com/openebs/cstor-operators/issues/238

But, i can't solve it.

Initially, the pod runs normally on the worker1 node. Then shut down worker1 and reschedule the pod to the worker2 node. After that, the status of the pod will always be ContainerCreating.

$ kubectl describe pod xxx
...
Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    41m                  default-scheduler  Successfully assigned default/busybox-2 to worker2
  Warning  FailedMount  6m41s (x2 over 11m)  kubelet            Unable to attach or mount volumes: unmounted volumes=[demo-vol], unattached volumes=[kube-api-access-97tfz demo-vol]: timed out waiting for the condition
  Warning  FailedMount  2m11s (x8 over 22m)  kubelet            Unable to attach or mount volumes: unmounted volumes=[demo-vol], unattached volumes=[demo-vol kube-api-access-97tfz]: timed out waiting for the condition
  Warning  FailedMount  62s (x10 over 21m)   kubelet            MountVolume.MountDevice failed for volume "pvc-86721747-6a91-4357-99f2-5186743500f7" : rpc error: code = Internal desc = Volume pvc-86721747-6a91-4357-99f2-5186743500f7 is not ready: Replicas yet to connect to controller
ubuntu@ubuntu-virtual-machine:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/arm64"}
jianghushinian commented 1 year ago
ubuntu@ubuntu-virtual-machine:~$ kubectl get node
NAME                     STATUS     ROLES                  AGE     VERSION
ubuntu-virtual-machine   Ready      control-plane,master   8h      v1.23.3
worker1                  NotReady   <none>                 7h33m   v1.23.3
worker2                  Ready      <none>                 4h15m   v1.23.3

ubuntu@ubuntu-virtual-machine:~$ kubectl get pod -n openebs -o wide
NAME                                                              READY   STATUS        RESTARTS            AGE     IP              NODE      NOMINATED NODE   READINESS GATES
cspc-operator-5897d854c8-54vh6                                    1/1     Terminating   2 (<invalid> ago)   8h      10.10.1.29      worker1   <none>           <none>
cspc-operator-5897d854c8-pkrqf                                    1/1     Running       0                   77m     10.10.189.77    worker2   <none>           <none>
cstor-storage-hr6w-d8bf7d8fd-k6brm                                3/3     Running       0                   90m     10.10.189.126   worker2   <none>           <none>
cstor-storage-md9q-67bbbdbf7c-4xjgv                               3/3     Terminating   0                   5h43m   10.10.1.40      worker1   <none>           <none>
cstor-storage-md9q-67bbbdbf7c-j279j                               0/3     Pending       0                   77m     <none>          <none>    <none>           <none>
cvc-operator-57597fc854-d4nvr                                     1/1     Terminating   2 (<invalid> ago)   8h      10.10.1.32      worker1   <none>           <none>
cvc-operator-57597fc854-wvf7d                                     1/1     Running       0                   77m     10.10.189.71    worker2   <none>           <none>
openebs-cstor-admission-server-7499f86dbf-569rv                   2/2     Running       0                   77m     10.10.189.73    worker2   <none>           <none>
openebs-cstor-admission-server-7499f86dbf-758mj                   2/2     Terminating   5 (<invalid> ago)   8h      10.10.1.34      worker1   <none>           <none>
openebs-cstor-csi-controller-0                                    6/6     Running       0                   82m     10.10.189.65    worker2   <none>           <none>
openebs-cstor-csi-node-7slzs                                      2/2     Running       4 (<invalid> ago)   4h13m   172.16.31.134   worker2   <none>           <none>
openebs-cstor-csi-node-lfg4s                                      2/2     Running       4 (<invalid> ago)   6h36m   172.16.31.133   worker1   <none>           <none>
openebs-cstor-cspc-operator-55db679b8f-2wzzz                      1/1     Running       0                   77m     10.10.189.76    worker2   <none>           <none>
openebs-cstor-cspc-operator-55db679b8f-bvj55                      1/1     Terminating   0                   175m    10.10.1.49      worker1   <none>           <none>
openebs-cstor-cvc-operator-79ccf94c48-6mslq                       1/1     Terminating   2 (<invalid> ago)   8h      10.10.1.26      worker1   <none>           <none>
openebs-cstor-cvc-operator-79ccf94c48-wg9ck                       1/1     Running       0                   77m     10.10.189.121   worker2   <none>           <none>
openebs-localpv-provisioner-7c59ff46-2ckmh                        1/1     Running       0                   77m     10.10.189.74    worker2   <none>           <none>
openebs-localpv-provisioner-7c59ff46-vrg5b                        1/1     Terminating   0                   90m     10.10.235.131   worker1   <none>           <none>
openebs-ndm-4b775                                                 1/1     Running       3 (<invalid> ago)   6h36m   172.16.31.133   worker1   <none>           <none>
openebs-ndm-cluster-exporter-69d646cc99-gmqrl                     1/1     Running       0                   77m     10.10.189.78    worker2   <none>           <none>
openebs-ndm-cluster-exporter-69d646cc99-jcbht                     1/1     Terminating   2 (<invalid> ago)   8h      10.10.1.35      worker1   <none>           <none>
openebs-ndm-lhfk6                                                 1/1     Running       0                   65m     172.16.31.134   worker2   <none>           <none>
openebs-ndm-node-exporter-mdwng                                   1/1     Running       2 (<invalid> ago)   6h36m   10.10.1.33      worker1   <none>           <none>
openebs-ndm-node-exporter-tgv6h                                   1/1     Running       1 (<invalid> ago)   120m    10.10.189.68    worker2   <none>           <none>
openebs-ndm-operator-f667d76d6-lztg5                              0/1     Terminating   2 (<invalid> ago)   8h      10.10.1.28      worker1   <none>           <none>
openebs-ndm-operator-f667d76d6-vbjbz                              1/1     Running       0                   77m     10.10.189.80    worker2   <none>           <none>
pvc-86721747-6a91-4357-99f2-5186743500f7-target-5fbcb55f74mkfct   3/3     Running       0                   82m     10.10.189.69    worker2   <none>           <none>
pvc-c393f9e7-ad3f-4894-8bf6-cebcd908ac91-target-84bcf4f8f5f8bcj   3/3     Terminating   0                   4h39m   10.10.1.46      worker1   <none>           <none>
pvc-c393f9e7-ad3f-4894-8bf6-cebcd908ac91-target-84bcf4f8f5mc29b   3/3     Running       0                   82m     10.10.189.67    worker2   <none>           <none>
pvc-e51c486e-489f-4101-9499-59adb577b6a3-target-78d48768d-mhqj6   3/3     Running       1 (82m ago)         82m     10.10.189.70    worker2   <none>           <none>
pvc-e51c486e-489f-4101-9499-59adb577b6a3-target-78d48768d-w26bq   3/3     Terminating   0                   4h48m   10.10.1.44      worker1   <none>           <none>

ubuntu@ubuntu-virtual-machine:~$ kubectl get bd -n openebs
NAME                                           NODENAME   SIZE         CLAIMSTATE   STATUS   AGE
blockdevice-44fdf1a064f5066311ff46ca47ba1e80   worker1    6442450944   Claimed      Active   6h2m
blockdevice-b4597532715b40b7ed220a98b337b0a2   worker2    5368709120   Claimed      Active   4h9m

ubuntu@ubuntu-virtual-machine:~$ kubectl get cspi -n openebs
NAME                 HOSTNAME   FREE    CAPACITY   READONLY   PROVISIONEDREPLICAS   HEALTHYREPLICAS   STATUS   AGE
cstor-storage-hr6w   worker2    4810M   4810183k   false      1                     0                 ONLINE   4h1m
cstor-storage-md9q   worker1    5780M   5780372k   false      3                     3                 ONLINE   5h44m
jianghushinian commented 1 year ago

why cv's status is Offline?

$ kubectl get cv  -n openebs
NAME                                       CAPACITY   STATUS    AGE
pvc-677a4bed-6f54-4136-92cd-9eb5a87f1a85   1Gi        Offline   3h56m

an error will be reported in this code:

https://github.com/openebs/cstor-csi/blob/ed7121554bd27f09989e59d8495c4cc50751c1cf/pkg/utils/utils.go#L422-L437

image