nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

nerc-ocp-prod: node wrk-88 is in NotReady state #641

Closed jtriley closed 2 months ago

jtriley commented 2 months ago

Just noticed wrk-88 host is in NotReady state on the nerc-ocp-prod cluster:

$ oc get nodes/wrk-88
NAME     STATUS     ROLES    AGE    VERSION
wrk-88   NotReady   worker   294d   v1.26.7+c7ee51f

This is causing the cluster operators to be degraded currently. The worker machine config pool claims to be updating but that might be a side effect of any node in the NotReady state.

jtriley commented 2 months ago

Looking at console, it's at a login prompt and claims to have the correct IP. I'm unable to ping that host from VPN or another host in the prod cluster. Trying a reboot to see if the node comes back.

jtriley commented 2 months ago

The host rebooted successfully but was unable to obtain an IP. Looking at the network device via OBM, it appears none of the network ports have link currently. This will likely require a data center visit to fix.

jtriley commented 2 months ago

Turns out @hakasapl was able to reseat the links at the switch which fixed the link down issue. Rebooting the host got it to return to the cluster. Thanks @hakasapl!