rhc-ose-ansible installation failure

JaredBurck commented 8 years ago

Have tried 3 times over the course of today to provision new os1 environment using the rhc-ose-ansible scripts. However, this has failed all 3 times with the same error:

TASK [cockpit-ui : Deploy registry-console] ************************************
skipping: [master1.rh-ocp.example.com]

PLAY RECAP *********************************************************************
localhost                  : ok=13   changed=7    unreachable=0    failed=0
master1.rh-ocp.example.com : ok=432  changed=94   unreachable=0    failed=0
nfs1.rh-ocp.example.com    : ok=70   changed=13   unreachable=0    failed=0
node1.rh-ocp.example.com   : ok=148  changed=39   unreachable=0    failed=0
node2.rh-ocp.example.com   : ok=148  changed=39   unreachable=0    failed=0
node3.rh-ocp.example.com   : ok=148  changed=39   unreachable=0    failed=0

ERROR! the role 'openshift_common' was not found in /root/repository/rhc-ose-ansible/playbooks/openshift/roles:/root/repository/rhc-ose-ansible/playbooks/openshift:/etc/ansible/roles:/root/repository/rhc-ose-ansible/roles

The error appears to have been in '/root/repository/rhc-ose-ansible/roles/secure-registry/meta/main.yaml': line 3, column 6, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

 dependencies:
   - { role: openshift_common }
     ^ here

ERROR: Post Install failed to run with: ansible-playbook -i /root/repository/rhc-ose-ansible/inventory_rh-ocp /root/repository/rhc-ose-ansible/playbooks/openshift/post-install.yaml

sabre1041 commented 8 years ago

@JaredBurck can you provide command used to execute

@oybed is this related to variable dependency on openshift-ansible that we were looking to remove?

oybed commented 8 years ago

@sabre1041 yes. However, discussing with @etsauer, we decided to not remove the dependency, but rather document how to set up the environment (as other dependencies will be used in the future). We need to get the readme files updated ASAP.

@JaredBurck please ensure that the execution environment has the following configuration set in one of the ansible.cfg files - e.g: ~/.ansible.cfg:

roles_path = <path to the rhc-ose dir>/rhc-ose-ansible/roles:<path to the openshift-ansible repo>/roles
filter_plugins = /usr/share/ansible_plugins/filter_plugins:<path to the openshift-ansible repo>/filter_plugins

For example, if your rhc-ose and openshift-ansible repos both exists in /root/repository, the values would be:

roles_path = /root/repository/rhc-ose/rhc-ose-ansible/roles:/root/repository/openshift-ansible/roles
filter_plugins = /usr/share/ansible_plugins/filter_plugins:/root/repository/openshift-ansible/filter_plugins

JaredBurck commented 8 years ago

@sabre1041 the following command was used for install:

# ./provision.sh -i=inventory/jb-ose-provision -p=/root/openshift-ansible/

@oybed the execution environment used was the openstack-docker-client from the rhc-ose repo. This repository has an ansible.cfg file in the /root/repository/rhc-ose-ansible/ directory. I added the provided configuration to that file and was able to successfully install openshift.

JaredBurck commented 8 years ago

Post installation - created jboss-eap quickstarts app based on jboss-eap-openshift64:1.4 image successfully. However, the route was not reachable via browser or using curl from master.

Commands and output from discussion in slack channel with @etsauer and @sabre1041.

[root@master1 ~]# oc get route -n my-project
NAME        HOST/PORT                                      PATH      SERVICE              TERMINATION   LABELS
jboss-eap   jboss-eap-my-project.apps.rh-ocp.example.com             jboss-eap:8080-tcp                 app=jboss-eap,jboss=quickstarts

[root@master1 ~]# curl http://jboss-eap-my-project.apps.rh-ocp.example.com
curl: (6) Could not resolve host: jboss-eap-my-project.apps.rh-ocp.example.com; Name or service not known

[root@master1 ~]# nslookup jboss-eap-my-project.apps.rh-ocp.example.com
Server:     172.16.166.48
Address:        172.16.166.48#53

** server can't find jboss-eap-my-project.apps.rh-ocp.example.com: NXDOMAIN

[root@master1 ~]# dig jboss-eap-my-project.apps.rh-ocp.example.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> jboss-eap-my-project.apps.rh-ocp.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 47390
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;jboss-eap-my-project.apps.rh-ocp.example.com. IN A

;; Query time: 1 msec
;; SERVER: 172.16.166.48#53(172.16.166.48)
;; WHEN: Fri Sep 02 17:04:36 EDT 2016
;; MSG SIZE  rcvd: 62

[root@master1 ~]# grep -r IP4_NAMESERVERS= /etc/*
/etc/sysconfig/network:IP4_NAMESERVERS=172.16.166.65

[root@master1 ~]# cat /etc/resolv.conf
# Generated by NetworkManager
search os1.phx2.redhat.com rh-ocp.example.com
nameserver 172.16.166.48
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh

where master1 = 172.16.166.48 and dns1 = 172.16.166.65 All nodes point to themselves except nfs server

[root@05cf3468f5fb rhc-ose-ansible]# ansible all -i inventory_rh-ocp -m shell -a 'cat /etc/resolv.conf'
master1.rh-ocp.example.com | SUCCESS | rc=0 >>
# Generated by NetworkManager
search os1.phx2.redhat.com rh-ocp.example.com
nameserver 172.16.166.48
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh

node2.rh-ocp.example.com | SUCCESS | rc=0 >>
# Generated by NetworkManager
search os1.phx2.redhat.com rh-ocp.example.com
nameserver 172.16.166.52
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh

node3.rh-ocp.example.com | SUCCESS | rc=0 >>
# Generated by NetworkManager
search os1.phx2.redhat.com rh-ocp.example.com
nameserver 172.16.166.55
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh

node1.rh-ocp.example.com | SUCCESS | rc=0 >>
# Generated by NetworkManager
search os1.phx2.redhat.com rh-ocp.example.com
nameserver 172.16.166.49
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh

nfs1.rh-ocp.example.com | SUCCESS | rc=0 >>
# Generated by NetworkManager
search os1.phx2.redhat.com rh-ocp.example.com
nameserver  172.16.166.65
nameserver 172.16.166.40

oybed commented 8 years ago

@JaredBurck What you are seeing is correct - i.e.: masters/nodes point to themselves now as of 3.2. This is due to dnsmasq running locally. The IP4_NAMESERVERS that @etsauer mentioned is actually driving some of the dnsmasq configuration and not the /etc/resolv.conf settings.

Anyway; what did you have the public_dns_forwarder setting set to during installation? That may be part of the problem. https://github.com/rhtconsulting/rhc-ose/blob/openshift-enterprise-3/rhc-ose-ansible/inventory/ose-provision#L51 Also, I know you said that you're running in os1, so this shouldn't be an issue, but remember to allow port 8053/udp from the nodes to the masters (also new as of 3.2) - i.e.: on OpenStack with Neutron, this port needs to be added to the master security groups.

oybed commented 8 years ago

@JaredBurck if you can add my public key to the master and one or more of the nodes, I can take a closer look to see what's going on.

oybed commented 8 years ago

@JaredBurck @etsauer @sabre1041 actually thinking about this some more - I don't think this has ever worked after moving to the newly implemented ansible roles/playbooks - i.e.: there's nothing that adds the wildcard DNS record to the local DNS server that gets installed as part of the provisioning. That needs to be fixed :-)

rhtconsulting / rhc-ose

rhc-ose-ansible installation failure #229