Closed danielchristianschroeter closed 3 years ago
I reinstalled the OKD cluster finally with the release 4.6.0-0.okd-2021-01-17-185703. The most important part on a bare matal installation is, that you start your bootstrap and master machines more or less at the same time. If you manually type the core-install command with append-karg (and you are not able to copy past it) it takes sometimes to long for the installation process...
Describe the bug Some revision-pruner pods in the namespace openshift-etcd, openshift-kube-apiserver, openshift-kube-scheduler and openshift-kube-controller-manager stuck in ContainerCreating state after a clean bare metal installation with 4.6.0-0.okd-2020-12-21-142926 and FCOS 33.20201214.2.0.
I see those error events in the related pods:
I also tried a restart of master-2 but this did not changed anything. All clusteroperators shows available = true in oc get co For now I deleted the stucking pods. It seems to be a workaround but I don't think the deletion is a real solution...
Version 4.6.0-0.okd-2020-12-21-142926 with FCOS 33.20201214.2.0 (bate metal installation with VMs within VMware ESXi 6.7)
How reproducible
oc adm release extract --tools registry.svc.ci.openshift.org/origin/release@sha256:068a04c84d0ef8d6325a37497da3d69152104ea357db29498378ad44760042f5
Create install-config.yaml in install_dir
Create ignition files and upload them after to a HTTP server ./openshift-install create manifests --dir=install_dir/ ./openshift-install create ignition-configs --dir=install_dir/
Create new VMs (1x bootstrap, 3x master) and boot from .iso https://builds.coreos.fedoraproject.org/prod/streams/testing/builds/33.20201214.2.0/x86_64/fedora-coreos-33.20201214.2.0-live.x86_64.iso
Verify that all the DNS records are created and the required ips from the bootstrap and master are added the related load balancer pools for the required ports (bootstrap and master for Port 22623 and 6443; master for Port 443 and 80. api-int.okd.basedomain.com > LB-IP api.okd.basedomain.com > CNAME to api-int.okd.basedomain.com (LB) *.apps.okd.basedomain.com > CNAME to api-int.okd.basedomain.com (LB)
Start the coreos-installer with the following parameter (I added append-karg to bypass the issue #394) sudo coreos-installer install /dev/sda --insecure-ignition --copy-network --ignition-url http://httpserverdomain.com/bootstrap.ign --append-karg="ip=10.1.232.57::10.1.232.1:255.255.255.0:k8s-bootstrap-1-01.okd.basedomain.com:ens160:none:10.1.231.85:10.1.231.5" sudo coreos-installer install /dev/sda --insecure-ignition --copy-network --ignition-url http://httpserverdomain.com/master.ign --append-karg="ip=10.1.232.191::10.1.232.1:255.255.255.0:k8s-master-1-01.okd.basedomain.com:ens160:none:10.1.231.85:10.1.231.5" sudo coreos-installer install /dev/sda --insecure-ignition --copy-network --ignition-url http://httpserverdomain.com/master.ign --append-karg="ip=10.1.232.13::10.1.232.1:255.255.255.0:k8s-master-2-01.okd.basedomain.com:ens160:none:10.1.231.85:10.1.231.5" sudo coreos-installer install /dev/sda --insecure-ignition --copy-network --ignition-url http://httpserverdomain.com/master.ign --append-karg="ip=10.1.232.158::10.1.232.1:255.255.255.0:k8s-master-3-01.okd.basedomain.com:ens160:none:10.1.231.85:10.1.231.5"
Wait some hours after ./openshift-install --dir=install_dir/ wait-for bootstrap-complete --log-level=info is successful.
Log bundle log-bundle and must-gather can be downloaded here: https://drive.google.com/file/d/1jU3XRf3Si-4Ro5i3Nqi6CiBfkBHkFSw4/view?usp=sharing