openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

OCP 3.7, GlusterFS, wiping fails during "Unlabel any existing GlusterFS nodes" #6661

Closed smossber closed 6 years ago

smossber commented 6 years ago

Description

After a failed GlusterFS deployment I wanted to wipe everything as I've understood that the config playbook is not idempotent.

But when running /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-glusterfs/config.yml with openshift_storage_glusterfs_wipe set to true, I get an error similar to https://github.com/openshift/openshift-ansible/issues/5548 on the task "Unlabel any existing GlusterFS nodes". It complains that the dict object openshift can't be found.

Version

Please put the following version information in the code block indicated below.

If you're running from playbooks installed via RPM or atomic-openshift-utils

Place the output between the code block below:

# ansible --version
ansible 2.4.1.0

# rpm -q atomic-openshift-utils openshift-ansible
atomic-openshift-utils-3.7.14-1.git.0.4b35b2d.el7.noarch
openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch
Steps To Reproduce
  1. [step 1] ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-glusterfs/config.yml -e openshift_storage_glusterfs_wipe=true

  2. [step 2]

Expected Results

Describe what you expected to happen.

Ansible playbook to be able to perform the step.

Observed Results

Describe what is actually happening.

TASK [openshift_storage_glusterfs : Delete pre-existing heketi resources] *****************************************************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'template,route,service,dc,jobs,secret', u'selector': u'deploy-heketi'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'svc', u'name': u'heketi-storage-endpoints'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'secret', u'name': u'heketi-storage-topology-secret'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'secret', u'name': u'heketi-storage-config-secret'})
ok: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'template,route,service,dc', u'name': u'heketi-storage'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'svc', u'name': u'heketi-db-storage-endpoints'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'sa', u'name': u'heketi-storage-service-account'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'secret', u'name': u'heketi-storage-admin-secret'})

TASK [openshift_storage_glusterfs : Wait for deploy-heketi pods to terminate] *************************************************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org]

TASK [openshift_storage_glusterfs : Wait for heketi pods to terminate] ********************************************************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org]

TASK [openshift_storage_glusterfs : assert] ***********************************************************************************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org] => {
    "changed": false, 
    "failed": false, 
    "msg": "All assertions passed"
}

TASK [openshift_storage_glusterfs : Delete pre-existing GlusterFS resources] **************************************************************************************************************************************
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'template', u'name': u'glusterfs'})
changed: [master1.openshift.mitzicom.int.m0sslab.org] => (item={u'kind': u'daemonset', u'name': u'glusterfs-storage'})

TASK [openshift_storage_glusterfs : Unlabel any existing GlusterFS nodes] *****************************************************************************************************************************************
fatal: [master1.openshift.mitzicom.int.m0sslab.org]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'openshift'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_storage_glusterfs/tasks/glusterfs_deploy.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Unlabel any existing GlusterFS nodes\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'openshift'"}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-glusterfs/config.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
infra1.openshift.mitzicom.int.m0sslab.org : ok=50   changed=2    unreachable=0    failed=0
infra2.openshift.mitzicom.int.m0sslab.org : ok=50   changed=2    unreachable=0    failed=0
localhost                  : ok=12   changed=0    unreachable=0    failed=0
master1.openshift.mitzicom.int.m0sslab.org : ok=57   changed=4    unreachable=0    failed=1   
master2.openshift.mitzicom.int.m0sslab.org : ok=46   changed=2    unreachable=0    failed=0
master3.openshift.mitzicom.int.m0sslab.org : ok=46   changed=2    unreachable=0    failed=0
node1.openshift.mitzicom.int.m0sslab.org : ok=50   changed=2    unreachable=0    failed=0
node2.openshift.mitzicom.int.m0sslab.org : ok=45   changed=2    unreachable=0    failed=0

INSTALLER STATUS **************************************************************************************************************************************************************************************************
Initialization             : Complete
GlusterFS Install          : In Progress
        This phase can be restarted by running: playbooks/byo/openshift-glusterfs/config.yml

For long output or logs, consider using a gist

Additional Information

Provide any additional information which may help us diagnose the issue.

openshift_deployment_type=openshift-enterprise openshift_release=v3.7

openshift_master_dynamic_provisioning_enabled=true ...

host group for masters

[masters] master1.openshift.mitzicom.int.m0sslab.org master2.openshift.mitzicom.int.m0sslab.org master3.openshift.mitzicom.int.m0sslab.org

host group for etcd

[etcd] master1.openshift.mitzicom.int.m0sslab.org master2.openshift.mitzicom.int.m0sslab.org master3.openshift.mitzicom.int.m0sslab.org

[lb]

lb.openshift.mitzicom.int.m0sslab.org

host group for nodes, includes region info

[nodes] master[1:3].openshift.mitzicom.int.m0sslab.org openshift_node_labels="{'region': 'primary', 'zone': 'management'}" openshift_schedulable=false infra1.openshift.mitzicom.int.m0sslab.org openshift_node_labels="{'region':'infra', 'zone': 'management'}" infra2.openshift.mitzicom.int.m0sslab.org openshift_node_labels="{'region':'infra', 'zone': 'management'}" node1.openshift.mitzicom.int.m0sslab.org openshift_node_labels="{'region': 'primary', 'zone': 'app'}" node2.openshift.mitzicom.int.m0sslab.org openshift_node_labels="{'region': 'primary', 'zone': 'app'}"

[glusterfs] infra2.openshift.mitzicom.int.m0sslab.org glusterfs_devices="[ '/dev/vdc' ]" infra1.openshift.mitzicom.int.m0sslab.org glusterfs_devices="[ '/dev/vdc' ]" node1.openshift.mitzicom.int.m0sslab.org glusterfs_devices="[ '/dev/vdc' ]"`



Rerunning the playbook results in the same error.
DanyC97 commented 6 years ago

@smossber can you try with a newer openshift-ansible tag version ?

I tried with latest openshift-ansible-3.7.29-1 and no longer have that error however i have a new one

2018-02-11 11:14:57,659 p=29824 u=root |  TASK [openshift_storage_glusterfs : Load heketi topology] ************************************************************************************************************************************************
2018-02-11 11:15:02,789 p=29824 u=root |  fatal: [370-master1]: FAILED! => {"changed": true, "cmd": ["oc", "rsh", "--namespace=glusterfs", "deploy-heketi-storage-1-2pq9l", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "hGg3nkBKHQAURiCnOlto2VjZBun/lSYKixb+TT5LEoE=", "topology", "load", "--json=/tmp/openshift-glusterfs-ansible-ls3tKn/topology.json", "2>&1"], "delta": "0:00:04.809425", "end": "2018-02-11 11:14:32.737780", "failed_when_result": true, "rc": 0, "start": "2018-02-11 11:14:27.928355", "stderr": "", "stderr_lines": [], "stdout": "Creating cluster ... ID: f197eab8f8508e7bd392398674229539\n\tCreating node 370-gluster1 ... ID: a540ed1d70f0a92991edea422007f1a5\n\t\tAdding device /dev/sdd ... OK\n\tCreating node 370-gluster2 ... Unable to create node: Unable to execute command on glusterfs-storage-chwv4:\n\tCreating node 370-gluster3 ... Unable to create node: Unable to execute command on glusterfs-storage-chwv4:", "stdout_lines": ["Creating cluster ... ID: f197eab8f8508e7bd392398674229539", "\tCreating node 370-gluster1 ... ID: a540ed1d70f0a92991edea422007f1a5", "\t\tAdding device /dev/sdd ... OK", "\tCreating node 370-gluster2 ... Unable to create node: Unable to execute command on glusterfs-storage-chwv4:", "\tCreating node 370-gluster3 ... Unable to create node: Unable to execute command on glusterfs-storage-chwv4:"]}

for which i'll open a new issue.

DanyC97 commented 6 years ago

@smossber i assume is all fixed, if so please can you close the issue ?