hpe-storage / python-hpedockerplugin

HPE Native Docker Plugin
Apache License 2.0
36 stars 64 forks source link

Automated Installer: doryd issue on reinstalling plugin on multi-master #740

Closed sonawane-shashikant closed 4 years ago

sonawane-shashikant commented 4 years ago

Automated installer fails when plugin is installed on multi-master setup without running the Uninstall script. Issue is with doryd container. Installer fails one of the nodes saying doryd container already exists. During upgrade scenario this issue is likely to occur.

Expected Results - doryd creation should be skipped if installer finds doryd Already Exists.

Actual result - Below error message with one failed node.

TASK [Deployment on Kubernetes cluster when version is 1.13] *** task path: /root/installer_new/python-hpedockerplugin/ansible_3par_docker_plugin/tasks/configure_doryd_service.yml:43 fatal: [15.212.196.133]: FAILED! => {"changed": true, "cmd": "cd /root/installer_new/python-hpedockerplugin/ansible_3par_docker_plugin\n cd ..\n /usr/local/bin/kubectl create -f provisioner/k8s/dep-kube-storage-controller-k8s113.yaml", "delta": "0:00:00.515371", "end": "2019-09-26 19:40:52.666848", "msg": "non-zero return code", "rc": 1, "start": "2019-09-26 19:40:52.151477", "stderr": "Error from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": clusterroles.rbac.authorization.k8s.io \"doryd\" already exists\nError from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": clusterrolebindings.rbac.authorization.k8s.io \"doryd\" already exists\nError from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": serviceaccounts \"doryd\" already exists\nError from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": deployments.extensions \"kube-storage-controller-doryd\" already exists", "stderr_lines": ["Error from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": clusterroles.rbac.authorization.k8s.io \"doryd\" already exists", "Error from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": clusterrolebindings.rbac.authorization.k8s.io \"doryd\" already exists", "Error from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": serviceaccounts \"doryd\" already exists", "Error from server (AlreadyExists): error when creating \"provisioner/k8s/dep-kube-storage-controller-k8s113.yaml\": deployments.extensions \"kube-storage-controller-doryd\" already exists"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/root/installer_new/python-hpedockerplugin/ansible_3par_docker_plugin/install_hpe_3par_volume_driver.retry

PLAY RECAP ***** 15.212.196.133 : ok=71 changed=30 unreachable=0 failed=1 15.212.196.134 : ok=50 changed=23 unreachable=0 failed=0 15.212.196.135 : ok=50 changed=23 unreachable=0 failed=0 15.212.196.136 : ok=41 changed=20 unreachable=0 failed=0 15.212.196.137 : ok=41 changed=20 unreachable=0 failed=0 15.212.196.138 : ok=41 changed=20 unreachable=0 failed=0 localhost : ok=2 changed=0 unreachable=0 failed=0

[root@cssos196133 ansible_3par_docker_plugin]# docker ps -a | grep plugin dc527d0476ff hpestorage/legacyvolumeplugin:3.2 "/bin/sh -c ./plugin…" About a minute ago Up About a minute plugin_container

bhagyashree-sarawate commented 4 years ago

Raised PR#745

sneharai4 commented 4 years ago

@sonawane-shashikant Fix merged https://github.com/hpe-storage/python-hpedockerplugin/pull/745. Please verify.

c-raghav commented 4 years ago

For openshift still failing :

PLAY [Install HPE 3PAR Volume Driver for Kubernetes/OpenShift] *****

TASK [Gathering Facts] ***** ok: [10.50.0.156]

TASK [Initialize multimaster, k8s_1_13, os_3_11 flags as false by default] ***** ok: [10.50.0.156]

TASK [Set multimaster flag as true when we have more than one master in hosts file] **** ok: [10.50.0.156]

TASK [Execute oc version and check for openshift save output] ** changed: [10.50.0.156]

TASK [Set flag os_3_11 to true if openshift version is 3.11] *** ok: [10.50.0.156]

TASK [Execute kubernetes version when oc version command was not found] **** skipping: [10.50.0.156]

TASK [Set flag k8s_1_13 to true if Kubernetes 1.13 version is found] *** skipping: [10.50.0.156]

TASK [Check whether doryd as a container exists for multimaster Kubernetes/Openshift setup] **** fatal: [10.50.0.156]: FAILED! => {"changed": true, "cmd": "kubectl get pods --namespace kube-system -o wide | grep doryd", "delta": "0:00:00.249869", "end": "2019-10-09 01:44:36.618093", "msg": 019-10-09 01:44:36.368224", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ...ignoring

TASK [Set flag doryd_exists as true if alreay running] *** skipping: [10.50.0.156]

TASK [Deployment on Openshift cluster when version is 3.11] ** changed: [10.50.0.156]

TASK [Deployment on Kubernetes cluster when version is 1.13] ***** fatal: [10.50.0.156]: FAILED! => {"msg": "The conditional check 'multimaster == true and k8s_1_13 == true and and doryd_exists == false' failed. The error was: template error while templating st ', got 'doryd_exists'. String: {% if multimaster == true and k8s_1_13 == true and and doryd_exists == false %} True {% else %} False {% endif %}\n\nThe error appears to have been in '/root/pytho /tasks/configure_doryd_service.yml': line 58, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Deployment on ^ here\n"} to retry, use: --limit @/root/python-hpedockerplugin/ansible_3par_docker_plugin/install_hpe_3par_volume_driver.retry

PLAY RECAP *** 10.50.0.156 : ok=72 changed=31 unreachable=0 failed=1 10.50.0.157 : ok=65 changed=26 unreachable=0 failed=0 10.50.0.162 : ok=65 changed=26 unreachable=0 failed=0 10.50.0.163 : ok=56 changed=25 unreachable=0 failed=0 10.50.0.164 : ok=56 changed=25 unreachable=0 failed=0 localhost : ok=2 changed=0 unreachable=0 failed=0

c-raghav commented 4 years ago

There seems to be a lag here , please add some time out here

TASK [Check whether doryd as a container exists for multimaster Kubernetes/Openshift setup] **** fatal: [10.50.0.156]: FAILED! => {"changed": true, "cmd": "kubectl get pods --namespace kube-system -o wide | grep doryd", "delta": "0:00:00.231068", "end": "2019-10-09 02:53:03.210332", "msg": "non-zero return code", "rc": 1, "start": "2019-10-09 02:53:02.979264", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ...ignoring

sneharai4 commented 4 years ago

@c-raghav @sonawane-shashikant Please verify. Fix merged https://github.com/hpe-storage/python-hpedockerplugin/pull/749

sonawane-shashikant commented 4 years ago

This bug is verified as fixed. Tested on Kubernetes MultiMaster setup. Uninstalled the existing plugin using Automted installer. Installer did not fail any of the nodes saying doryd container already exists.

Below is the output captured while verification.

740_fixed.txt