hpe-storage / python-hpedockerplugin

HPE Native Docker Plugin
Apache License 2.0
36 stars 64 forks source link

3.3 Statefulset: After node reboot 1 pod is still remain in ContainerCreating state. #746

Closed sandesh-desai closed 4 years ago

sandesh-desai commented 4 years ago

Testbed Details: Host: -- Openshift single master setup. Host OS: "Red Hat Enterprise Linux Server"VERSION="7.6 OC version: oc v3.11.117

Steps to reproduce:

1) Cordon master and worker-1 [root@cssosbe04-b01 StatefulSet1]# oc get nodes NAME STATUS ROLES AGE VERSION cssosbe04-b01 Ready,SchedulingDisabled master 79d v1.11.0+d4cacc0 cssosbe04-b02 Ready,SchedulingDisabled compute 79d v1.11.0+d4cacc0 cssosbe04-b03 Ready compute,infra 79d v1.11.0+d4cacc0

2) oc create -f sc.yml 3) oc create -f pvc.yml 4) oc create -f configmap.yaml 5) oc create -f statefulset.yaml 10 replicas created successfully on worker-2 6) reboot worker-2 7) After reboot 9 pods are in Running state and one pod remain in ContainerCreating state.

Attaching the logs and yml.

3parlog.txt 10 POD creation result.txt dory log.txt

sc.yml.txt

pvc.yml.txt

configmap.yaml.txt statefulset.yaml.txt

============================================================= In Kubernetes Multimaster 1) Scaleup to 15 : 2 out of 15 PODs is in ContainerCreating State. 2) Scaledown to 10 : 3 out of 15 PODs is in Terminating State.

Steps Followed:

1.Create SC,PVC,Configmap,POD with replicas=5 on one worker. 2.Scale up to 15 PODs Expected Result: All 15 PODs should be in running state. Actual Result:2 out of 15 PODs is in ContainerCreating State. 3.Scale down to 10 PODs Expected Result: 15 PODs should reduce to 10 and come into running state. Actual Result:3 out of 15 PODs is in Terminating State.

Attaching the logs:

BUG_DETAILS.txt

LOG_3pardcv.txt

log_dory11.txt

sandesh-desai commented 4 years ago

In Kubernetes singlemaster

1.Create SC,PVC,Configmap,POD with replicas=10 on one worker. 2.Scale down to 5 PODs. 3.Scale up to 12 PODs. 4.Reboot worker-1

Expected Result: All 12 PODs should be in running state. Actual Result:All 12 PODs is in ContainerCreating State.

Attaching the logs: dcv log.txt statefulset.txt worker1 dory log.txt

wdurairaj commented 4 years ago

@sandesh-desai , please follow the steps in https://github.com/hpe-storage/python-hpedockerplugin/blob/master/docs/troubleshooting.md#debugging-issue-with-statefulset-pod-stuck-in-containercreating-state-after-a-node-reboot and let me know, if this is a viable workaround for the issue.

Root cause of this issue, is the absense of /dev/mapper entries after the atomic-openshift-node.service starts (after dockerd starts). The dev mapper entries are not formed even after the host is rescanned for new devices with mounts issued.

The situation can only be recovered by restarting multipathd using systemctl restart multipathd

sandesh-desai commented 4 years ago

Follow below steps: kubectl cordon before the node is shutdown (which has statefulset pods mounted) and kubectl uncordon after the node reboots and the kubelet (or) atomic-openshift-node.service starts properly

Verified and working fine.

Attaching the logs: 3PAR dcv.txt BUG 746.txt DORY_746.txt

Also run below Steps: 1) Cordon master and worker-1 2) oc create -f sc.yml 3) oc create -f pvc.yml 4) oc create -f configmap.yaml 5) oc create -f statefulset.yaml 6) 10 replicas created successfully on worker-2 7) Reboot worker-2

After reboot All pods are in Running state.

Verified and working fine.

Attaching the logs: dcvlog_746.txt DORY_746.txt bug_746.txt

Closing the Bug.