okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.67k stars 289 forks source link

Bootstrap Machine would not get ready (restarting pod by bootkube.service) #1861

Closed horvaro closed 5 months ago

horvaro commented 5 months ago

Describe the bug

After preparing the network, virtual Machines, manifest files and ignition files, the bootstrap machine won't get ready. I found that nothing was served at http://BOOTSTRAP-IP:22623 and the master machines would not load their config. Looking into the logs, I found out, that there is a pod restarting inside the bootstrap machine:

Jan 18 12:19:10 okd4-bootstrap.domain.com systemd[1]: Started bootkube.service - Bootstrap a Kubernetes cluster.
Jan 18 12:19:10 okd4-bootstrap.domain.com podman[4068]: 2024-01-18 12:19:10.556149694 +0000 UTC m=+0.089378130 container create df4a020d084815080874a376fedc671d535796b08548b67cd9ce2d156cdbecac (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=relaxed_hellman, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:10 okd4-bootstrap.domain.com podman[4068]: 2024-01-18 12:19:10.500692868 +0000 UTC m=+0.033921234 image pull 777cbb5cae4f3fa2c14c73ce71f67cfdd8092b6c11f316463f8030b699d39e76 quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227
Jan 18 12:19:10 okd4-bootstrap.domain.com podman[4068]: 2024-01-18 12:19:10.924146473 +0000 UTC m=+0.457374912 container init df4a020d084815080874a376fedc671d535796b08548b67cd9ce2d156cdbecac (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=relaxed_hellman, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:10 okd4-bootstrap.domain.com podman[4068]: 2024-01-18 12:19:10.931190338 +0000 UTC m=+0.464418774 container start df4a020d084815080874a376fedc671d535796b08548b67cd9ce2d156cdbecac (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=relaxed_hellman, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:10 okd4-bootstrap.domain.com podman[4068]: 2024-01-18 12:19:10.932853844 +0000 UTC m=+0.466082281 container attach df4a020d084815080874a376fedc671d535796b08548b67cd9ce2d156cdbecac (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=relaxed_hellman, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:10 okd4-bootstrap.domain.com relaxed_hellman[4079]: quay.io/openshift/okd-content@sha256:7df1a8d75db145a9f761e1de429d209dc73b21291d791082fa9fbb37231f0dcf
Jan 18 12:19:10 okd4-bootstrap.domain.com podman[4068]: 2024-01-18 12:19:10.965335319 +0000 UTC m=+0.498563739 container died df4a020d084815080874a376fedc671d535796b08548b67cd9ce2d156cdbecac (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=relaxed_hellman, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:11 okd4-bootstrap.domain.com podman[4092]: 2024-01-18 12:19:11.44741158 +0000 UTC m=+0.469266109 container remove df4a020d084815080874a376fedc671d535796b08548b67cd9ce2d156cdbecac (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=relaxed_hellman, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72, io.openshift.release=4.14.0-0.okd-2024-01-06-084517)
Jan 18 12:19:11 okd4-bootstrap.domain.com podman[4102]: 2024-01-18 12:19:11.549671491 +0000 UTC m=+0.083378604 container create 8eb7e0181ed0ca060312ddeead75fac3fe506e9ad6b9c6f3f8eed8f241c15034 (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=elated_rosalind, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72, io.openshift.release=4.14.0-0.okd-2024-01-06-084517)
Jan 18 12:19:11 okd4-bootstrap.domain.com podman[4102]: 2024-01-18 12:19:11.492417681 +0000 UTC m=+0.026124775 image pull 777cbb5cae4f3fa2c14c73ce71f67cfdd8092b6c11f316463f8030b699d39e76 quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227
Jan 18 12:19:12 okd4-bootstrap.domain.com podman[4102]: 2024-01-18 12:19:12.055922652 +0000 UTC m=+0.589629803 container init 8eb7e0181ed0ca060312ddeead75fac3fe506e9ad6b9c6f3f8eed8f241c15034 (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=elated_rosalind, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72, io.openshift.release=4.14.0-0.okd-2024-01-06-084517)
Jan 18 12:19:12 okd4-bootstrap.domain.com podman[4102]: 2024-01-18 12:19:12.062182691 +0000 UTC m=+0.595889804 container start 8eb7e0181ed0ca060312ddeead75fac3fe506e9ad6b9c6f3f8eed8f241c15034 (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=elated_rosalind, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:12 okd4-bootstrap.domain.com elated_rosalind[4113]: quay.io/openshift/okd-content@sha256:3840d9c2574e8790377190a106c952d06e80b90a4546f410aad751274703f659
Jan 18 12:19:12 okd4-bootstrap.domain.com podman[4102]: 2024-01-18 12:19:12.435160643 +0000 UTC m=+0.968867752 container attach 8eb7e0181ed0ca060312ddeead75fac3fe506e9ad6b9c6f3f8eed8f241c15034 (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=elated_rosalind, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)
Jan 18 12:19:12 okd4-bootstrap.domain.com podman[4102]: 2024-01-18 12:19:12.43544868 +0000 UTC m=+0.969155804 container died 8eb7e0181ed0ca060312ddeead75fac3fe506e9ad6b9c6f3f8eed8f241c15034 (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=elated_rosalind, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72, io.openshift.release=4.14.0-0.okd-2024-01-06-084517)
Jan 18 12:19:13 okd4-bootstrap.domain.com podman[4123]: 2024-01-18 12:19:13.434123647 +0000 UTC m=+1.324601656 container remove 8eb7e0181ed0ca060312ddeead75fac3fe506e9ad6b9c6f3f8eed8f241c15034 (image=quay.io/openshift/okd@sha256:c4a6b6850701202f629c0e451de784b02f0de079650a1b9ccbf610448ebc9227, name=elated_rosalind, io.openshift.release=4.14.0-0.okd-2024-01-06-084517, io.openshift.release.base-image-digest=sha256:ac5b45ba281c3fe044e6f104689faa7204a5730a3cd9682fe9d6a48ddeaebf72)

After some retries the process "dies" and starts all over again:

Jan 18 12:19:04 okd4-bootstrap.domain.com bootkube.sh[3969]: error: 1 error occurred:
Jan 18 12:19:04 okd4-bootstrap.domain.com bootkube.sh[3969]:         * illegal base64 data at input byte 0
Jan 18 12:19:04 okd4-bootstrap.domain.com systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Jan 18 12:19:04 okd4-bootstrap.domain.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
Jan 18 12:19:04 okd4-bootstrap.domain.com systemd[1]: bootkube.service: Consumed 4.537s CPU time.
Jan 18 12:19:10 okd4-bootstrap.domain.com systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 4.
Jan 18 12:19:10 okd4-bootstrap.domain.com systemd[1]: Stopped bootkube.service - Bootstrap a Kubernetes cluster.

Version

How reproducible

100% (Retried setup of the OKD Cluster 3 times)

Log bundle

openshift-install gather bootstrap --dir ./install_dir/ --bootstrap 192.168.20.239 log-bundle-20240118133303

vrutkovs commented 5 months ago

illegal base64 data at input byte

See https://github.com/okd-project/okd/discussions/1784#discussioncomment-7577095 - you're using an empty pull secret, but it needs to have a valid base64 value