Closed DanielFroehlich closed 1 year ago
Heads up @cluster/ocp3-admin - the "cluster/ocp3" label was applied to this issue.
restarted ocp3 today after the long infra downtime. Needed to approve some CSRs, then the above mentioned problems appears again. @ortwinschneider , would you mind take a look, or ask @rbo for help? I am still guessing a cert issue as root cause.
Does the problem still exist?
yes
All nodes are not Ready.
[root@ocp3support ~]# export KUBECONFIG=/root/ocp4install/auth/kubeconfig
[root@ocp3support ~]# oc get csr | awk '/Pending/ {print $1}' | xargs oc adm certificate approve
[root@ocp3support ~]# oc get nodes
NAME STATUS ROLES AGE VERSION
compute-0.ocp3.stormshift.coe.muc.redhat.com Ready worker 598d v1.21.8+ee73ea2
compute-1.ocp3.stormshift.coe.muc.redhat.com Ready worker 598d v1.21.8+ee73ea2
compute-2.ocp3.stormshift.coe.muc.redhat.com Ready worker 598d v1.21.8+ee73ea2
control-0.ocp3.stormshift.coe.muc.redhat.com NotReady master,worker 2y222d v1.21.8+ee73ea2
control-1.ocp3.stormshift.coe.muc.redhat.com NotReady master,worker 2y222d v1.21.8+ee73ea2
control-2.ocp3.stormshift.coe.muc.redhat.com Ready master,worker 2y222d v1.21.8+ee73ea2
[root@ocp3support ~]#
Better but not perfect.
control-0, fixed with https://github.com/stormshift/support/issues/72#issuecomment-1067121213
control-1 not available via ssh
Controle-1:
Let's force reboot.
ah now it reboots... without any activity
stuck at the same point, looks like a reboot loop. Let's switch off and switch on in rhev.
Strange, I don't know. I suggest reinstalling control-1 and follow: https://docs.openshift.com/container-platform/4.8/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html
Who like to do this job? :-)
ocp3 has been decomissioned. rest in peace!
control-0 control-1 do not get ready, probably due to cert issues after long down time.
Probably needs cert recovery procedure applied. Please investigate