okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.77k stars 297 forks source link

bootstraper fails to start container for installation #1857

Closed oe3gwu closed 10 months ago

oe3gwu commented 10 months ago

Describe the bug I am using the RedHat OCP4 Helper node and also try to bootstrap via the Ansible scripts to a vsphere cluster. This works well. But when the bootstraper boots up, the installation hangs at this window:

image

The bootstrap vm seems booted without error:

image

But the control-plane vm's cannot install.

image

DNS entries are correct.

image

image

I found out that the podman services (bootkube) tries to start the init containers, and then simply dies. I can pull the container using root manually though and start it. Therefore also the DNS entries of the bootstrapper are correct.

image

Pulling logs out of the VM is a little a challenge, so I havent added text files.

Version I tried with 4.10 and 4.13

How reproducible 100% reproducable by starting the installation via ansible scripts on my system.

Log bundle Cant create oc logs because system isnt up yet.

vrutkovs commented 10 months ago

Please attach log bundle

oe3gwu commented 10 months ago

Attached. Thanks! log-bundle-20240113165011.tar.gz

melledouwsma commented 10 months ago

From release-image.log:

Jan 13 15:41:26 bootstrap.okd.okd.local systemd[1]: Starting release-image.service - Download the OpenShift Release Image...
Jan 13 15:41:32 bootstrap.okd.okd.local podman[1956]: 2024-01-13 15:41:32.784600221 +0000 UTC m=+4.847285019 system refresh
Jan 13 15:41:32 bootstrap.okd.okd.local release-image-download.sh[1863]: Pulling quay.io/openshift-release-dev/ocp-release@sha256:05ba8e63f8a76e568afe87f182334504a01d47342b6ad5b4c3ff83a2463018bd...
Jan 13 15:42:29 bootstrap.okd.okd.local release-image-download.sh[2160]: d3e740b44b787a88c9ab2755c60ce03d7412e591ac71864a7ce31a90ee206df9
Jan 13 15:42:29 bootstrap.okd.okd.local podman[2160]: 2024-01-13 15:42:29.265415652 +0000 UTC m=+56.178203945 image pull d3e740b44b787a88c9ab2755c60ce03d7412e591ac71864a7ce31a90ee206df9 quay.io/openshift-release-dev/ocp-release@sha256:05ba8e63f8a76e568afe87f182334504a01d47342b6ad5b4c3ff83a2463018bd
Jan 13 15:42:29 bootstrap.okd.okd.local systemd[1]: Finished release-image.service - Download the OpenShift Release Image.

Are you sure you're using the client tools (oc and installer) from https://github.com/okd-project/okd/releases? This looks like an OCP installation.

oe3gwu commented 10 months ago

I have to check it. I use the OCP4 Helper node, yes, but otherwise I am using an ansible script from a friend. Maybe there is a bug. I check on Tuesday. Not at home currently.

horvaro commented 10 months ago

Having the same issue here with OKD 4.14.0

This comes over and over again. DNS entries are there and correct. Networking is wired up correctly.

oe3gwu commented 10 months ago

For me, I use OCP4 Helper node and a modified OCP Vsphere Ansible Script. It seems that there still are some OCP Elements in there. But still, as far as I saw it still should work, because the Bootstrapper Podman dies when starting a Pod automatically that I manually can start.