Closed sbadakhc closed 6 years ago
can you please share your inventory file ?
Sure
[OSEv3:children]
masters
nodes
etcd
glusterfs
[OSEv3:vars]
ansible_ssh_user=sbadakhc
ansible_become=yes
debug_level=2
openshift_deployment_type=origin
openshift_release=v3.7
openshift_image_tag=v3.7.0
openshift_docker_blocked_registries=public
openshift_docker_disable_push_dockerhub=False
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': 'XXXXXXXX'}
openshift_management_install_management=true
osm_project_request_template='default/project-request'
openshift_master_cluster_method=native
openshift_master_cluster_hostname=ocp.example.com
openshift_master_cluster_public_hostname=ocp.example.com
openshift_master_default_subdomain=apps.example.com
osm_default_node_selector='region=primary'
openshift_hosted_router_selector='region=infra'
openshift_hosted_router_force_subdomain='${name}-${namespace}.apps.example.com'
openshift_hosted_router_certificate={"certfile": "/root/router.crt", "keyfile": "/root/router.key", "cafile": "/root/ca.crt"}
openshift_hosted_registry_selector='region=infra'
openshift_metrics_install_metrics=true
openshift_metrics_image_prefix=registry.example.com:5000/openshift/origin-
openshift_metrics_image_version=v3.7
openshift_logging_install_logging=true
openshift_logging_kibana_proxy_memory_limit=256Mi
openshift_logging_kibana_proxy_memory_limit=256Mi
openshift_logging_elasticsearch_instance_ram=512Mi
openshift_logging_image_prefix=registry.example.com:5000/openshift/origin-
openshift_logging_image_version=v3.7
openshift_logging_master_public_url=https://ocp.example.com:8443
openshift_logging_elasticsearch_proxy_image_prefix=registry.example.com:5000/openshift/
openshift_logging_elasticsearch_proxy_image_version=v1.0.0
openshift_hosted_prometheus_deploy=true
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_master_api_port=443
openshift_master_console_port=443
openshift_master_overwrite_named_certificates=true
openshift_master_audit_config={"enabled": true, "auditFilePath": "/var/log/openpaas-oscp-audit/openpaas-oscp-audit.log", "maximumFileRetentionDays": 7, "maximumFileSizeMegabytes": 500, "maximumRetainedFiles": 5}
openshift_enable_origin_repo=false
openshift_management_app_template=miq-template
openshift_management_storage_class=preconfigured
openshift_disable_check=disk_availability,memory_availability
openshift_storage_glusterfs_wipe=False
openshift_storage_glusterfs_heketi_wipe=False
openshift_storage_glusterfs_heketi_version=dev
[masters]
mst01ewd01
mst02ewd01
[nodes]
mst01ewd01 openshift_schedulable=false openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
mst02ewd01 openshift_schedulable=false openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
nde01ewd01 openshift_schedulable=true openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
nde02ewd01 openshift_schedulable=true openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
[etcd]
mst01ewd01
mst02ewd01
[glusterfs]
stg01ewd01 glusterfs_devices='[ "/dev/sdb" ]'
stg02ewd01 glusterfs_devices='[ "/dev/sdb" ]'
stg03ewd01 glusterfs_devices='[ "/dev/sdb" ]'
Hi,
I pulled the latest source and ran the playbook again. It seems as though the services node services are not starting and I saw this issue which may be related.
Hi,
It looks like the connection is failing from the master to gluster node. I going to check the ssh keys and connectivity between those hosts:
<mst01ewd01> (0, '/home/sbadakhc\n', '')
<mst01ewd01> ESTABLISH SSH CONNECTION FOR USER: sbadakhc
<mst01ewd01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb mst01ewd01 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576 `" && echo ansible-tmp-1519648649.96-112630137777576="` echo /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576 `" ) && sleep 0'"'"''
<mst01ewd01> (0, 'ansible-tmp-1519648649.96-112630137777576=/home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576\n', '')
<mst01ewd01> PUT /tmp/tmphrzbva TO /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py
<mst01ewd01> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb '[mst01ewd01]'
<mst01ewd01> (0, 'sftp> put /tmp/tmphrzbva /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py\n', '')
<mst01ewd01> ESTABLISH SSH CONNECTION FOR USER: sbadakhc
<mst01ewd01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb mst01ewd01 '/bin/sh -c '"'"'chmod u+x /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/ /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py && sleep 0'"'"''
<mst01ewd01> (0, '', '')
<mst01ewd01> ESTABLISH SSH CONNECTION FOR USER: sbadakhc
<mst01ewd01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=sbadakhc -o ConnectTimeout=10 -o ControlPath=/home/sbadakhc/.ansible/cp/a6da0567bb -tt mst01ewd01 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-xtelylebeiczsazhsgzszzbwtqvfwfik; /usr/bin/python /home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/oc_label.py; rm -rf "/home/sbadakhc/.ansible/tmp/ansible-tmp-1519648649.96-112630137777576/" > /dev/null 2>&1'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<mst01ewd01> (0, '\r\n\r\n{"msg": {"returncode": 1, "cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite", "results": {}, "stderr": "Error from server (NotFound): nodes \\"stg03ewd01\\" not found\\n", "stdout": ""}, "failed": true, "exception": " File \\"/tmp/ansible_TNIopd/ansible_module_oc_label.py\\", line 46, in <module>\\n import ruamel.yaml as yaml\\n", "invocation": {"module_args": {"kind": "node", "name": "stg03ewd01", "labels": [{"value": "storage-host", "key": "glusterfs"}], "namespace": null, "kubeconfig": "/etc/origin/master/admin.kubeconfig", "state": "add", "debug": false, "selector": null}}}\r\n', 'Shared connection to mst01ewd01 closed.\r\n')
The full traceback is:
File "/tmp/ansible_TNIopd/ansible_module_oc_label.py", line 46, in <module>
import ruamel.yaml as yaml
failed: [mst01ewd01] (item=stg03ewd01) => {
"changed": false,
"invocation": {
"module_args": {
"debug": false,
"kind": "node",
"kubeconfig": "/etc/origin/master/admin.kubeconfig",
"labels": [
{
"key": "glusterfs",
"value": "storage-host"
}
],
"name": "stg03ewd01",
"namespace": null,
"selector": null,
"state": "add"
}
},
"item": "stg03ewd01",
"msg": {
"cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite",
"results": {},
"returncode": 1,
"stderr": "Error from server (NotFound): nodes \"stg03ewd01\" not found\n",
"stdout": ""
}
}
Some more information.
OK so I managed to progress this but still seeing errors:
TASK [openshift_storage_glusterfs : Label GlusterFS nodes] ********************************************************************************************************************************************************
failed: [mst01ewd01] (item=stg01ewd01) => {"changed": false, "item": "stg01ewd01", "msg": {"cmd": "/bin/oc label node stg01ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg01ewd01\" not found\n", "stdout": ""}}
failed: [mst01ewd01] (item=stg02ewd01) => {"changed": false, "item": "stg02ewd01", "msg": {"cmd": "/bin/oc label node stg02ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg02ewd01\" not found\n", "stdout": ""}}
failed: [mst01ewd01] (item=stg03ewd01) => {"changed": false, "item": "stg03ewd01", "msg": {"cmd": "/bin/oc label node stg03ewd01 glusterfs=storage-host --overwrite", "results": {}, "returncode": 1, "stderr": "Error from server (NotFound): nodes \"stg03ewd01\" not found\n", "stdout": ""}}
to retry, use: --limit @/home/cloud-user/openshift-ansible/playbooks/deploy_cluster.retry
Closing this and creating a new issue for the failing GlusterFS Labels.
Environment:
Error:
Logs:
Extra Information: