Closed miktaylor3 closed 3 years ago
Please hold - I'm doing some more testing, it seems like it may be related to the external DNS we were using. Install worked when using 1.1.1.1 as the external DNS
I've done several installs the last couple day and reproduced this in the Reston lab. Whenever the system points to and external DNS and is forced to resolve addresses on the private network (192.168.8.xx), it seems to have problems connecting to the cluster address and the install fails.
Fresh install on the same system, same config.sh and only changing the DNS to a local on the external network with the suggested DNS entries and the system works fine.
I've even tried just changing the DNS after and install and not re-installing and in general, once it fails, it tends to stay on the private network and does not work, even if I modify after modifying the DNS field to point to something external, I generally need to re-install the whole things (RHEL plus OCS) to reset it.
The fix for this issue has been pushed and will be published in the next version (soon).
In the meantime, just run a sudo nmcli con up faroswan
which will load the correct DNS settings.
Summary of Problem: Installation fails without a local DNS server
This is related to the factory install failing with can't open "API at https://api.edge.rdc100.lan:6443"
I was able to reproduce this is the lab in Reston by turning off the local DNS server and doing an install with just external DNS servers configured (DNS servers 208.67.222.222 and 208.67.220.220) . The installation failed at the same point as we saw in the factory.
Also note the IP address "10.0.2.3:53" in the error below is not something I have in my lab or anything I configured, not sure where that is coming from
STEP 8: WAIT FOR THE OPENSHIFT INSTALLATION TO COMPLETE write upgrade status to tmp file localhost>localhost wait for bootstrap to complete failed: localhost: non-zero return code there was an error during install failed: localhost: level=info msg=Waiting up to 20m0s for the Kubernetes API at https://api.edge.rdc100.lan:6443... level=error msg=Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.edge.rdc100.lan:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp: lookup api.edge.rdc100.lan on 10.0.2.3:53: no such host level=info msg=Use the following commands to gather logs from the cluster level=info msg=openshift-install gather bootstrap --help level=fatal msg=failed waiting for Kubernetes API: Get "https://api.edge.rdc100.lan:6443/version?timeout=32s": dial tcp: lookup api.edge.rdc100.lan on 10.0.2.3:53: no such host