okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.67k stars 289 forks source link

[OKD 4.12] IPI installation on Openstack Fails #1490

Open m0ter opened 1 year ago

m0ter commented 1 year ago

Installation fails on OpenStack with installer

Bootstrap node logs below from bootkube

Feb 03 16:05:01 okd12-4b62m-bootstrap bootkube.sh[1293]: Restoring CVO overrides
Feb 03 16:05:01 okd12-4b62m-bootstrap bootkube.sh[6673]: Unable to connect to the server: dial tcp: lookup api-int.okd12.driften.ml: no such host
Feb 03 16:05:11 okd12-4b62m-bootstrap bootkube.sh[1293]: Trying again to restore CVO overrides
Feb 03 16:05:11 okd12-4b62m-bootstrap bootkube.sh[6709]: Unable to connect to the server: dial tcp: lookup api-int.okd12.driften.ml: no such host
Feb 03 16:05:21 okd12-4b62m-bootstrap bootkube.sh[1293]: Trying again to restore CVO overrides
Feb 03 16:05:21 okd12-4b62m-bootstrap bootkube.sh[6756]: Unable to connect to the server: dial tcp: lookup api-int.okd12.driften.ml: no such host

and never succeeds, as a workaround I added 10.1.0.5 api-int.okd12.driften.ml to /etc/hosts

[core@okd12-4b62m-bootstrap ~]$ host api-int.okd12.driften.ml
Host api-int.okd12.driften.ml not found: 3(NXDOMAIN)
[core@okd12-4b62m-bootstrap ~]$ dig +short api-int.okd12.driften.ml
10.1.0.5

After that the bootstrap succeeds but the installation fails later with

time="2023-02-03T16:01:07+01:00" level=info msg="Waiting up to 40m0s (until 4:41PM) for the cluster at https://api.okd12.driften.ml:6443 to initialize..."
time="2023-02-03T16:01:07+01:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operators authentication, image-registry, ingress, insights, kube-apiserver, kube-controller-manager, kube-scheduler, machine-api, monitoring, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver, storage are not available\n* Could not update imagestream \"openshift/driver-toolkit\" (571 of 838): the server is down or not responding\n* Could not update oauthclient \"console\" (515 of 838): the server does not recognize this resource, check extension API servers\n* Could not update role \"openshift-console-operator/prometheus-k8s\" (755 of 838): resource may have been deleted\n* Could not update role \"openshift-console/prometheus-k8s\" (758 of 838): resource may have been deleted"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator baremetal Disabled is False with : "
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator cloud-controller-manager CloudControllerOwner is True with AsExpected: Cluster Cloud Controller Manager Operator owns cloud controllers at 4.12.0-0.okd-2023-01-21-055900"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: "
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator insights ClusterTransferAvailable is False with Disconnected: failed to pull cluster transfer: cluster authorization token is not configured"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator insights Disabled is True with NoToken: Health reporting is disabled"
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator insights SCAAvailable is Unknown with : "
time="2023-02-03T16:41:07+01:00" level=info msg="Cluster operator network ManagementStateDegraded is False with : "
time="2023-02-03T16:41:07+01:00" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
time="2023-02-03T16:41:07+01:00" level=error msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Cluster operators authentication, image-registry, ingress, insights, kube-apiserver, kube-controller-manager, kube-scheduler, machine-api, monitoring, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver, storage are not available\n* Could not update imagestream \"openshift/driver-toolkit\" (571 of 838): the server is down or not responding\n* Could not update oauthclient \"console\" (515 of 838): the server does not recognize this resource, check extension API servers\n* Could not update role \"openshift-console-operator/prometheus-k8s\" (755 of 838): resource may have been deleted\n* Could not update role \"openshift-console/prometheus-k8s\" (758 of 838): resource may have been deleted"

Version ./openshift-install version ./openshift-install 4.12.0-0.okd-2023-01-21-055900 built from commit e33d9bda73e58d6584f922d68260931c02992a41 release image quay.io/openshift/okd@sha256:8c5e4d3a76aba995c005fa7f732d68658cc67d6f11da853360871012160b2ebf release architecture amd64

How reproducible 100%

Log bundle log-bundle-20230203172242.tar.gz

Loumy commented 4 months ago

Hello, It could be a classic when OVN is handling the DNS request, you need to double check ovn-sbctl list DNS in your OVN controller to double check if you have an entree for api-int.okd12.driften.ml