openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.18k stars 2.31k forks source link

GlusterFS task fails. Issue with CNI config. #7282

Closed sbadakhc closed 6 years ago

sbadakhc commented 6 years ago

Environment:

#Ansible
ansible 2.4.2.0
  config file = /home/sbadakhc/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/sbadakhc/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

#Openshift
openshift-ansible-3.9.0-0.52.0

# Operating System
CentOS Linux release 7.4.1708 (Core) 
Linux mgt01ewd01 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Error:

TASK [openshift_storage_glusterfs : Label GlusterFS nodes] ********************************************************************************************************************************************************
failed: [mst01ewd01] (item=stg01ewd01) => {"changed": false, "item": "stg01ewd01", "msg": {"cmd": "/bin/oc label node stg01ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg01ewd01\" not found\n", "stdout": ""}}
failed: [mst01ewd01] (item=stg02ewd01) => {"changed": false, "item": "stg02ewd01", "msg": {"cmd": "/bin/oc label node stg02ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg02ewd01\" not found\n", "stdout": ""}}
failed: [mst01ewd01] (item=stg03ewd01) => {"changed": false, "item": "stg03ewd01", "msg": {"cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg03ewd01\" not found\n", "stdout": ""}}
    to retry, use: --limit @/home/sbadakhc/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
mst01ewd01                 : ok=424  changed=61   unreachable=0    failed=1   
mst02ewd01                 : ok=307  changed=43   unreachable=0    failed=0   
nde01ewd01                 : ok=116  changed=12   unreachable=0    failed=1   
nde02ewd01                 : ok=108  changed=13   unreachable=0    failed=1   
stg01ewd01                 : ok=21   changed=0    unreachable=0    failed=0   
stg02ewd01                 : ok=21   changed=0    unreachable=0    failed=0   
stg03ewd01                 : ok=21   changed=0    unreachable=0    failed=0   

INSTALLER STATUS **************************************************************************************************************************************************************************************************
Initialization             : Complete (0:01:02)
Health Check               : Complete (0:00:29)
etcd Install               : Complete (0:00:51)
Master Install             : Complete (0:03:49)
Master Additional Install  : Complete (0:00:33)
Node Install               : Complete (0:12:49)
GlusterFS Install          : In Progress (0:00:25)
    This phase can be restarted by running: playbooks/openshift-glusterfs/config.yml

Failure summary:

  1. Hosts:    nde02ewd01
     Play:     Configure nodes
     Task:     restart node
     Message:  Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

  2. Hosts:    nde01ewd01
     Play:     Additional node config
     Task:     Wait for Node Registration
     Message:  Failed without returning a message.

  3. Hosts:    mst01ewd01
     Play:     Configure GlusterFS
     Task:     Label GlusterFS nodes
     Message:  All items completed

Logs:


[root@nde01ewd01 ~]# journalctl -xe
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616462   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/quobyte"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616469   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/cephfs"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616478   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/downward-api"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616486   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/fc"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616495   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/flocker"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616504   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/azure-file"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616513   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/configmap"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616523   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/vsphere-volume"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616531   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/azure-disk"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616540   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/photon-pd"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616548   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/projected"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616555   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/portworx-volume"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616564   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/scaleio"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616572   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/local-volume"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.616580   30461 plugins.go:378] Loaded volume plugin "kubernetes.io/storageos"
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.622044   30461 server.go:869] Started kubelet v1.7.6+a08f5eeb62
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.622084   30461 server.go:132] Starting to listen on 0.0.0.0:10250
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.622995   30461 server.go:314] Adding debug handlers to kubelet server.
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.626093   30461 kubelet.go:1191] Image garbage collection failed once. Stats initialization may not have completed yet: unable to find data for contai
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.626392   30461 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.632161   30461 kubelet.go:1705] Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": unable to find data 
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.632183   30461 kubelet.go:1713] Failed to check if disk space is available on the root partition: failed to get fs info for "root": unable to find da
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.632189   30461 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node nde01ewd01
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.632213   30461 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node nde01ewd01
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.632221   30461 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node nde01ewd01
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633052   30461 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633072   30461 status_manager.go:141] Starting to sync pod status with apiserver
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633093   30461 kubelet.go:1785] Starting kubelet main sync loop.
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633109   30461 kubelet.go:1796] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m1
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633471   30461 container_manager_linux.go:398] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.633566   30461 container_manager_linux.go:543] [ContainerManager]: Fail to get rootfs information unable to find data for container /
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633586   30461 volume_manager.go:243] The desired_state_of_world populator starts
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.633589   30461 volume_manager.go:245] Starting Kubelet Volume Manager
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: W0225 23:58:54.655553   30461 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.657865   30461 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin i
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.663808   30461 factory.go:336] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.664475   30461 factory.go:351] Registering Docker factory
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: W0225 23:58:54.664492   30461 manager.go:260] Registration of the rkt container factory failed: unable to communicate with Rkt api service: rkt: cannot tcp Dial rkt
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: W0225 23:58:54.664625   30461 manager.go:271] Registration of the crio container factory failed: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dial unix /var/run/crio.s
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.664636   30461 factory.go:54] Registering systemd factory
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.664813   30461 factory.go:86] Registering Raw factory
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.664923   30461 manager.go:1139] Started watching for new ooms in manager
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.667745   30461 oomparser.go:185] oomparser using systemd
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.668855   30461 manager.go:306] Starting recovery of all containers
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.763800   30461 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.770148   30461 manager.go:311] Recovery completed
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: E0225 23:58:54.878725   30461 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'nde01ewd01' not found
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.882290   30461 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node nde01ewd01
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.882312   30461 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node nde01ewd01
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.882319   30461 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node nde01ewd01
Feb 25 23:58:54 nde01ewd01 origin-node[30461]: I0225 23:58:54.882337   30461 kubelet_node_status.go:82] Attempting to register node nde01ewd01

Extra Information:

● origin-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/origin-node.service.d
           └─openshift-sdn-ovs.conf
   Active: activating (start) since Sun 2018-02-25 23:59:29 UTC; 39s ago
     Docs: https://github.com/openshift/origin
  Process: 30557 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
  Process: 30555 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
  Process: 30560 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
  Process: 30559 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
 Main PID: 30562 (openshift)
   Memory: 39.8M
   CGroup: /system.slice/origin-node.service
           ├─30562 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
           └─30605 journalctl -k -f

Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.057366   30562 manager.go:306] Starting recovery of all containers
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.138475   30562 manager.go:311] Recovery completed
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.192546   30562 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: E0226 00:00:00.270264   30562 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'nde01ewd01' not found
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.272819   30562 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node nde01ewd01
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.272840   30562 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node nde01ewd01
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.272848   30562 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node nde01ewd01
Feb 26 00:00:00 nde01ewd01 origin-node[30562]: I0226 00:00:00.272864   30562 kubelet_node_status.go:82] Attempting to register node nde01ewd01
Feb 26 00:00:05 nde01ewd01 origin-node[30562]: W0226 00:00:05.262654   30562 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 26 00:00:05 nde01ewd01 origin-node[30562]: E0226 00:00:05.262794   30562 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
DanyC97 commented 6 years ago

can you please share your inventory file ?

sbadakhc commented 6 years ago

Sure

[OSEv3:children]
masters
nodes
etcd
glusterfs

[OSEv3:vars]
ansible_ssh_user=sbadakhc
ansible_become=yes
debug_level=2
openshift_deployment_type=origin
openshift_release=v3.7
openshift_image_tag=v3.7.0
openshift_docker_blocked_registries=public
openshift_docker_disable_push_dockerhub=False
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': 'XXXXXXXX'}
openshift_management_install_management=true
osm_project_request_template='default/project-request'
openshift_master_cluster_method=native
openshift_master_cluster_hostname=ocp.example.com
openshift_master_cluster_public_hostname=ocp.example.com
openshift_master_default_subdomain=apps.example.com
osm_default_node_selector='region=primary'
openshift_hosted_router_selector='region=infra'
openshift_hosted_router_force_subdomain='${name}-${namespace}.apps.example.com'
openshift_hosted_router_certificate={"certfile": "/root/router.crt", "keyfile": "/root/router.key", "cafile": "/root/ca.crt"}
openshift_hosted_registry_selector='region=infra'
openshift_metrics_install_metrics=true
openshift_metrics_image_prefix=registry.example.com:5000/openshift/origin-
openshift_metrics_image_version=v3.7
openshift_logging_install_logging=true
openshift_logging_kibana_proxy_memory_limit=256Mi
openshift_logging_kibana_proxy_memory_limit=256Mi
openshift_logging_elasticsearch_instance_ram=512Mi
openshift_logging_image_prefix=registry.example.com:5000/openshift/origin-
openshift_logging_image_version=v3.7
openshift_logging_master_public_url=https://ocp.example.com:8443
openshift_logging_elasticsearch_proxy_image_prefix=registry.example.com:5000/openshift/
openshift_logging_elasticsearch_proxy_image_version=v1.0.0
openshift_hosted_prometheus_deploy=true
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_master_api_port=443
openshift_master_console_port=443
openshift_master_overwrite_named_certificates=true
openshift_master_audit_config={"enabled": true, "auditFilePath": "/var/log/openpaas-oscp-audit/openpaas-oscp-audit.log", "maximumFileRetentionDays": 7, "maximumFileSizeMegabytes": 500, "maximumRetainedFiles": 5}
openshift_enable_origin_repo=false
openshift_management_app_template=miq-template
openshift_management_storage_class=preconfigured
openshift_disable_check=disk_availability,memory_availability
openshift_storage_glusterfs_wipe=False
openshift_storage_glusterfs_heketi_wipe=False
openshift_storage_glusterfs_heketi_version=dev

[masters]
mst01ewd01
mst02ewd01

[nodes]
mst01ewd01 openshift_schedulable=false openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
mst02ewd01 openshift_schedulable=false openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
nde01ewd01 openshift_schedulable=true openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
nde02ewd01 openshift_schedulable=true openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

[etcd]
mst01ewd01
mst02ewd01

[glusterfs]
stg01ewd01 glusterfs_devices='[ "/dev/sdb" ]'
stg02ewd01 glusterfs_devices='[ "/dev/sdb" ]'
stg03ewd01 glusterfs_devices='[ "/dev/sdb" ]'
sbadakhc commented 6 years ago

Hi,

I pulled the latest source and ran the playbook again. It seems as though the services node services are not starting and I saw this issue which may be related.

https://github.com/openshift/openshift-ansible/issues/3408

sbadakhc commented 6 years ago

Hi,

It looks like the connection is failing from the master to gluster node. I going to check the ssh keys and connectivity between those hosts:

<mst01ewd01> (0, '/home/sbadakhc\n', '')
<mst01ewd01> ESTABLISH SSH CONNECTION FOR USER: sbadakhc
<mst01ewd01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb mst01ewd01 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576 `" && echo ansible-tmp-1519648649.96-112630137777576="` echo /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576 `" ) && sleep 0'"'"''
<mst01ewd01> (0, 'ansible-tmp-1519648649.96-112630137777576=/home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576\n', '')
<mst01ewd01> PUT /tmp/tmphrzbva TO /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py
<mst01ewd01> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb '[mst01ewd01]'
<mst01ewd01> (0, 'sftp> put /tmp/tmphrzbva /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py\n', '')
<mst01ewd01> ESTABLISH SSH CONNECTION FOR USER: sbadakhc
<mst01ewd01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb mst01ewd01 '/bin/sh -c '"'"'chmod u+x /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/ /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py && sleep 0'"'"''
<mst01ewd01> (0, '', '')
<mst01ewd01> ESTABLISH SSH CONNECTION FOR USER: sbadakhc
<mst01ewd01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb -tt mst01ewd01 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-xtelylebeiczsazhsgzszzbwtqvfwfik; /usr/bin/python /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py; rm -rf "/home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/" > /dev/null 2>&1'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<mst01ewd01> (0, '\r\n\r\n{"msg": {"returncode": 1, "cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite", "results": {}, "stderr": "Error from server (NotFound): nodes \\"stg03ewd01\\" not found\\n", "stdout": ""}, "failed": true, "exception": "  File \\"/tmp/ansible_TNIopd/ansible_module_oc_label.py\\", line 46, in <module>\\n    import ruamel.yaml as yaml\\n", "invocation": {"module_args": {"kind": "node", "name": "stg03ewd01", "labels": [{"value": "storage-host", "key": "glusterfs"}], "namespace": null, "kubeconfig": "/etc/origin/master/admin.kubeconfig", "state": "add", "debug": false, "selector": null}}}\r\n', 'Shared connection to mst01ewd01 closed.\r\n')
The full traceback is:
  File "/tmp/ansible_TNIopd/ansible_module_oc_label.py", line 46, in <module>
    import ruamel.yaml as yaml

failed: [mst01ewd01] (item=stg03ewd01) => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "debug": false, 
            "kind": "node", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "labels": [
                {
                    "key": "glusterfs", 
                    "value": "storage-host"
                }
            ], 
            "name": "stg03ewd01", 
            "namespace": null, 
            "selector": null, 
            "state": "add"
        }
    }, 
    "item": "stg03ewd01", 
    "msg": {
        "cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite", 
        "results": {}, 
        "returncode": 1, 
        "stderr": "Error from server (NotFound): nodes \"stg03ewd01\" not found\n", 
        "stdout": ""
    }
}
sbadakhc commented 6 years ago

Some more information.

https://bugzilla.redhat.com/show_bug.cgi?id=1449357

sbadakhc commented 6 years ago

OK so I managed to progress this but still seeing errors:

TASK [openshift_storage_glusterfs : Label GlusterFS nodes] ********************************************************************************************************************************************************
failed: [mst01ewd01] (item=stg01ewd01) => {"changed": false, "item": "stg01ewd01", "msg": {"cmd": "/bin/oc label node stg01ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg01ewd01\" not found\n", "stdout": ""}}
failed: [mst01ewd01] (item=stg02ewd01) => {"changed": false, "item": "stg02ewd01", "msg": {"cmd": "/bin/oc label node stg02ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg02ewd01\" not found\n", "stdout": ""}}
failed: [mst01ewd01] (item=stg03ewd01) => {"changed": false, "item": "stg03ewd01", "msg": {"cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg03ewd01\" not found\n", "stdout": ""}}
    to retry, use: --limit @/home/cloud-user/openshift-ansible/playbooks/deploy_cluster.retry
sbadakhc commented 6 years ago

Closing this and creating a new issue for the failing GlusterFS Labels.