openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.17k stars 2.32k forks source link

OpenShift (Origin) 3.11 delpoy failed Unknown Host #10972

Closed hpiard closed 5 years ago

hpiard commented 5 years ago

Description

Provide a brief description of your issue here. For example:

Cannot deploy OpenShift 3.11 (Origin). Prerequisite executes fine. Deployment fails with: failed: [masternode-1.dev.edpc] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/bin/oc get pod master-etcd-masternode-1.dev.edpc -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "Unable to connect to the server: Unknown Host\n", "stdout": ""}} failed: [masternode-2.dev.edpc] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/bin/oc get pod master-etcd-masternode-2.dev.edpc -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "Unable to connect to the server: Unknown Host\n", "stdout": ""}} failed: [masternode-0.dev.edpc] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/bin/oc get pod master-etcd-masternode-0.dev.edpc -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "Unable to connect to the server: Unknown Host\n", "stdout": ""}}

Version

Please put the following version information in the code block indicated below.

(okd-design-eng) [centos@deployment-server openshift-ansible]$ ansible --version ansible 2.6.5 config file = /home/centos/okd-design-eng/okd-v3.11/openshift-ansible/ansible.cfg configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /home/centos/okd-design-eng/lib/python2.7/site-packages/ansible executable location = /home/centos/okd-design-eng/bin/ansible python version = 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

If you're operating from a git clone:

(okd-design-eng) [centos@deployment-server openshift-ansible]$ git describe openshift-ansible-4.0.0-0.122.0

Place the output between the code block below:

(okd-design-eng) [centos@deployment-server openshift-ansible]$ ansible --version
ansible 2.6.5
  config file = /home/centos/okd-design-eng/okd-v3.11/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/centos/okd-design-eng/lib/python2.7/site-packages/ansible
  executable location = /home/centos/okd-design-eng/bin/ansible
  python version = 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

TASK [openshift_control_plane : Report control plane errors] **********************************************************************
Tuesday 08 January 2019  21:46:10 +0000 (0:00:00.496)       0:25:51.966 *******
fatal: [masternode-0.dev.edpc]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}
fatal: [masternode-1.dev.edpc]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}
fatal: [masternode-2.dev.edpc]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}

NO MORE HOSTS LEFT ****************************************************************************************************************

PLAY RECAP ************************************************************************************************************************
appnode-0.dev.edpc         : ok=96   changed=19   unreachable=0    failed=0
appnode-1.dev.edpc         : ok=96   changed=19   unreachable=0    failed=0
haproxynode-0.dev.edpc     : ok=27   changed=3    unreachable=0    failed=0
infranode-0.dev.edpc       : ok=96   changed=19   unreachable=0    failed=0
infranode-1.dev.edpc       : ok=96   changed=19   unreachable=0    failed=0
infranode-2.dev.edpc       : ok=96   changed=19   unreachable=0    failed=0
localhost                  : ok=12   changed=0    unreachable=0    failed=0
masternode-0.dev.edpc      : ok=280  changed=52   unreachable=0    failed=1
masternode-1.dev.edpc      : ok=224  changed=47   unreachable=0    failed=1
masternode-2.dev.edpc      : ok=224  changed=47   unreachable=0    failed=1

INSTALLER STATUS ******************************************************************************************************************
Initialization              : Complete (0:01:25)
Health Check                : Complete (0:00:47)
Node Bootstrap Preparation  : Complete (0:03:02)
etcd Install                : Complete (0:00:55)
Load Balancer Install       : Complete (0:00:13)
Master Install              : In Progress (0:19:27)
        This phase can be restarted by running: playbooks/openshift-master/config.yml
Tuesday 08 January 2019  21:46:10 +0000 (0:00:00.316)       0:25:52.283 *******
===============================================================================
openshift_control_plane : Wait for control plane pods to appear --------------------------------------------------------- 1027.13s
Run health checks (install) - EL ------------------------------------------------------------------------------------------ 46.59s
Run variable sanity checks ------------------------------------------------------------------------------------------------ 39.08s
openshift_node : Install node, clients, and conntrack packages ------------------------------------------------------------ 18.86s
openshift_excluder : Install docker excluder - yum ------------------------------------------------------------------------ 11.56s
openshift_cli : Install clients -------------------------------------------------------------------------------------------- 8.80s
openshift_excluder : Install openshift excluder - yum ---------------------------------------------------------------------- 7.93s
openshift_node : install needed rpm(s) ------------------------------------------------------------------------------------- 7.72s
openshift_ca : Install the base package for admin tooling ------------------------------------------------------------------ 7.44s
openshift_node : Update journald setup ------------------------------------------------------------------------------------- 4.21s
tuned : Ensure files are populated from templates -------------------------------------------------------------------------- 3.56s
openshift_master_certificates : Check status of master certificates -------------------------------------------------------- 3.23s
openshift_control_plane : Copy static master scripts ----------------------------------------------------------------------- 3.04s
tuned : Restart tuned service ---------------------------------------------------------------------------------------------- 2.85s
tuned : Ensure files are populated from templates -------------------------------------------------------------------------- 2.84s
Set fact of no_proxy_internal_hostnames ------------------------------------------------------------------------------------ 2.84s
openshift_control_plane : Prepare master static pods ----------------------------------------------------------------------- 2.75s
openshift_node : Add iptables allow rules ---------------------------------------------------------------------------------- 2.59s
openshift_control_plane : Set fact of all etcd host IPs -------------------------------------------------------------------- 2.54s
Gathering Facts ------------------------------------------------------------------------------------------------------------ 2.18s

Failure summary:

  1. Hosts:    masternode-0.dev.edpc, masternode-1.dev.edpc, masternode-2.dev.edpc
     Play:     Configure masters
     Task:     Report control plane errors
     Message:  Control plane pods didn't come up

real    26m2.020s
user    9m51.712s
sys     3m5.222s
Steps To Reproduce

Step 1: ansible host file:

[OSEv3:children] masters nodes etcd lb

[OSEv3:vars] openshift_deployment_type=origin openshift_image_tag=v3.11 openshift_release=v3.11 openshift_master_default_subdomain=app.okd-lab.domain.com ansible_ssh_user=s.cfg.dev ansible_ssh_private_key_file=~/.ssh/id_rsa ansible_become=true osm_use_cockpit=true debug_level=4 openshift_http_proxy=http://192.168.1.2:8080 openshift_https_proxy=http://192./168.1.2:8080 openshift_no_proxy='localhost,127.0.0.1,.domain.com,.dev.edpc,.novalocal,192.168.7.0/24' openshift_additional_repos=[{'id': 'centos-okd-ci', 'name': 'centos-okd-ci', 'baseurl' :'http://buildlogs.centos.org/centos/7/paas/x86_64/openshift-origin311/', 'gpgcheck' :'0', 'enabled' :'1'}] openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}] openshift_master_htpasswd_users={'admin': 'blablabla'}

openshift_master_cluster_method=native openshift_master_cluster_hostname=openshift-internal.dev.edpc openshift_master_cluster_public_hostname=openshift-cluster.dev.edpc

os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant

[masters] masternode-0.dev.edpc masternode-1.dev.edpc masternode-2.dev.edpc

[etcd] masternode-0.dev.edpc masternode-1.dev.edpc masternode-2.dev.edpc

[lb] haproxynode-0.dev.edpc

[nodes] masternode-[0:2].dev.edpc openshift_node_group_name='node-config-master' appnode-0.dev.edpc openshift_node_group_name='node-config-compute' appnode-1.dev.edpc openshift_node_group_name='node-config-compute' infranode-0.dev.edpc openshift_node_group_name='node-config-infra' infranode-1.dev.edpc openshift_node_group_name='node-config-infra' infranode-2.dev.edpc openshift_node_group_name='node-config-infra'

Step 2: ansible-playbook -v -i inventory/hosts.maas-lab-okd playbooks/prerequisites.yml ansible-playbook -v -i inventory/hosts.maas-lab-okd playbooks/deploy_cluster.yml

Expected Results

Describe what you expected to happen.

Expect the deployment to succeed as prerequisites have been met.
Observed Results

Describe what is actually happening.

Deployment fails with above errors.

For long output or logs, consider using a gist

Additional Information

Provide any additional information which may help us diagnose the issue.

All nodes (including ansible node) are running:
cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
hpiard commented 5 years ago

I found the issue: /etc/environment had http_proxy and https_proxy settings. Added no_proxy to it for my doain and /bin/oc get pod master-etcd-masternode-1.dev.edpc succeeds now. This can be closed.