openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

openshift-ansible fails to setup cns cluster during advanced ocp install #5452

Closed ekuric closed 7 years ago

ekuric commented 7 years ago

Description

openshift-ansbile will fail to setup heketi/cns cluster

Provide a brief description of your issue here. For example: playbook has [OSEv3:children] masters nodes etcd glusterfs

[glusterfs] glusternode1 glusterfs_devices='[ "/dev/xvdf" ]' glusternode2 glusterfs_devices='[ "/dev/xvdf" ]' glusternode3 glusterfs_devices='[ "/dev/xvdf" ]' and during install it will fail to setup cns cluster. ```# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterfs-storage-5z612 0/1 CrashLoopBackOff 6 7m 172.31.15.3 ip-172-31-15-3.us-west-2.compute.internal glusterfs-storage-jt4jh 0/1 CrashLoopBackOff 6 7m 172.31.46.48 ip-172-31-46-48.us-west-2.compute.internal glusterfs-storage-z4bcz 0/1 CrashLoopBackOff 6 7m 172.31.25.175 ip-172-31-25-175.us-west-2.compute.internal ``` also ``` oc describe pod glusterfs-storage-5z612 Name: glusterfs-storage-5z612 Namespace: cnscluster Node: ip-172-31-15-3.us-west-2.compute.internal/172.31.15.3 Start Time: Tue, 19 Sep 2017 09:39:19 +0000 Labels: controller-revision-hash=3397209398 glusterfs=storage-pod glusterfs-node=pod pod-template-generation=1 Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"cnscluster","name":"glusterfs-storage","uid":"690f12de-9d1e-11e7-82d4-02476de8e110... openshift.io/scc=privileged Status: Running IP: 172.31.15.3 Created By: DaemonSet/glusterfs-storage Controlled By: DaemonSet/glusterfs-storage Containers: glusterfs: Container ID: docker://9b43907c6919adcea8220de0a2d25ee88b22e49f39b06c0ce0dce093ff55076d Image: rhgs3/rhgs-server-rhel7:3.3.0-23 Image ID: docker://sha256:d0293d2f727c8fcdea138285bff14b7f6b4073f525db61fa5281f20875db7b1e Port: State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 19 Sep 2017 09:45:06 +0000 Finished: Tue, 19 Sep 2017 09:45:06 +0000 Ready: False Restart Count: 6 Liveness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15 Readiness: exec [/bin/bash -c systemctl status glusterd.service] delay=40s timeout=3s period=25s #success=1 #failure=15 Environment: Mounts: /dev from glusterfs-dev (rw) /etc/glusterfs from glusterfs-etc (rw) /etc/ssl from glusterfs-ssl (ro) /run from glusterfs-run (rw) /run/lvm from glusterfs-lvm (rw) /sys/fs/cgroup from glusterfs-cgroup (ro) /var/lib/glusterd from glusterfs-config (rw) /var/lib/heketi from glusterfs-heketi (rw) /var/lib/misc/glusterfsd from glusterfs-misc (rw) /var/log/glusterfs from glusterfs-logs (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-lsqk6 (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: glusterfs-heketi: Type: HostPath (bare host directory volume) Path: /var/lib/heketi glusterfs-run: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: glusterfs-lvm: Type: HostPath (bare host directory volume) Path: /run/lvm glusterfs-etc: Type: HostPath (bare host directory volume) Path: /etc/glusterfs glusterfs-logs: Type: HostPath (bare host directory volume) Path: /var/log/glusterfs glusterfs-config: Type: HostPath (bare host directory volume) Path: /var/lib/glusterd glusterfs-dev: Type: HostPath (bare host directory volume) Path: /dev glusterfs-misc: Type: HostPath (bare host directory volume) Path: /var/lib/misc/glusterfsd glusterfs-cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup glusterfs-ssl: Type: HostPath (bare host directory volume) Path: /etc/ssl default-token-lsqk6: Type: Secret (a volume populated by a Secret) SecretName: default-token-lsqk6 Optional: false QoS Class: BestEffort Node-Selectors: glusterfs=storage-host region=primary Tolerations: node.alpha.kubernetes.io/notReady:NoExecute node.alpha.kubernetes.io/unreachable:NoExecute Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8m 8m 2 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume (combined from similar events): MountVolume.SetUp succeeded for volume "default-token-lsqk6" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-lvm" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-logs" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-config" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-dev" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-misc" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-heketi" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-run" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-ssl" 8m 8m 1 kubelet, ip-172-31-15-3.us-west-2.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "glusterfs-cgroup" 8m 8m 3 kubelet, ip-172-31-15-3.us-west-2.compute.internal spec.containers{glusterfs} Normal Pulled Container image "rhgs3/rhgs-server-rhel7:3.3.0-23" already present on machine 8m 8m 3 kubelet, ip-172-31-15-3.us-west-2.compute.internal spec.containers{glusterfs} Normal Created Created container 8m 8m 3 kubelet, ip-172-31-15-3.us-west-2.compute.internal spec.containers{glusterfs} Normal Started Started container 8m 8m 3 kubelet, ip-172-31-15-3.us-west-2.compute.internal spec.containers{glusterfs} Warning BackOff Back-off restarting failed container 8m 3m 27 kubelet, ip-172-31-15-3.us-west-2.compute.internal Warning FailedSync Error syncing pod root@ip-172-31-47-188: ~/openshift-ansible # ``` ##### Version Please put the following version information in the code block indicated below. * Your ansible version per `ansible --version` ``` # ansible --version ansible 2.3.2.0 config file = /root/openshift-ansible/ansible.cfg configured module search path = Default w/o overrides python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] root@ip-172-31-47-188: ~/openshift-ansible # git describe openshift-ansible-3.7.0-0.126.0-80-gb124c67 ``` ``` VERSION INFORMATION HERE PLEASE ``` ##### Steps To Reproduce 1. try to setup cns cluster as part of advanced install ##### Expected Results Describe what you expected to happen. cns cluster to be up and running ``` Example command and output or error messages ``` ##### Observed Results Describe what is actually happening. ``` Example command and output or error messages ``` For long output or logs, consider using a [gist](https://gist.github.com/) https://gist.github.com/ekuric/c6c46252219d3e9f1ddafca43e1ece4d ##### Additional Information Provide any additional information which may help us diagnose the issue. RHEL 7.5 OCP 3.7 using `manual` approach ``` # cns-deploy -n -g topology.json ``` where topology.json is as https://gist.github.com/ekuric/b47b268897d869adddb5be423946b44b works fine! Also, images are present on all OCP nodes in advance and before starting ansible playbook
ekuric commented 7 years ago

@jarrpa @humblec @jeremyeder This is with ocp 3.7 and latest branch of openshift-ansible

ekuric commented 7 years ago

packages

heketi-client-5.0.0-11.el7rhgs.x86_64
python-heketi-5.0.0-11.el7rhgs.x86_64
heketi-5.0.0-11.el7rhgs.x86_64
cns-deploy-5.0.0-41.el7rhgs.x86_64
sdodson commented 7 years ago

/assign jarrpa

On Tue, Sep 19, 2017 at 6:00 AM, Elvir Kuric notifications@github.com wrote:

heketi-client-5.0.0-11.el7rhgs.x86_64 packages

python-heketi-5.0.0-11.el7rhgs.x86_64 heketi-5.0.0-11.el7rhgs.x86_64 cns-deploy-5.0.0-41.el7rhgs.x86_64

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openshift/openshift-ansible/issues/5452#issuecomment-330491539, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC8IbzNnFB_IM9L62aPMtnDZJsOfL2cks5sj5DYgaJpZM4PcJLy .

jarrpa commented 7 years ago

@ekuric The latest downstream images do not work using openshift-ansible. Please use only upstream images with openshift-ansible for now. We'll be working to fix this as the upstream images are updated.

ekuric commented 7 years ago

@jarrpa ack, thank you!

jarrpa commented 7 years ago

@ekuric Newer builds of the downstream containers fixed the issue, and openshift-ansible can once again successfully install latest downstream builds. Closing this issue, feel free to reopen if the problem persists.