stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

OCP4 Compute 0 stuck with 0% CPU #145

Closed Javatar81 closed 7 months ago

Javatar81 commented 8 months ago

Processes cannot be started, waiting forever. Shutdown and reboot does not help. VM runs on storm3.

Javatar81 commented 8 months ago

[core@compute-0 ~]$ sudo systemctl status kubelet ○ kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─01-kubens.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf Active: inactive (dead)

journalctl -b -u kubelet.service -u crio.service -- No entries --

sudo systemctl start kubelet --- WAITING FOREVER ----

Javatar81 commented 8 months ago

Oct 27 09:09:25 compute-0.ocp4.stormshift.coe.muc.redhat.com bash[21212]: Error: readlink /var/lib/containers/storage/overlay: invalid argument Oct 27 09:09:25 compute-0.ocp4.stormshift.coe.muc.redhat.com systemd[1]: var-lib-containers-storage-overlay.mount: Deactivated successfully.

rbo commented 7 months ago

Just deleted the notready node

$ oc delete node compute-0.ocp4.stormshift.coe.muc.redhat.com
node "compute-0.ocp4.stormshift.coe.muc.redhat.com" deleted

To get MCO & ready again

rbo commented 7 months ago

Hard reboot of node compute-0.ocp4.stormshift.coe.muc.redhat.com

Javatar81 commented 7 months ago

Scheduling is disabled for compute-0. Can I schedule or are you still working on this?

rbo commented 7 months ago

I'm working on it

github-actions[bot] commented 7 months ago

Heads up @cluster/ocp4-admin - the "cluster/ocp4" label was applied to this issue.

rbo commented 7 months ago

Compute-0 joining again, csr apprioved.

compute-0.ocp4.stormshift.coe.muc.redhat.com   NotReady                   worker          1s       v1.26.9+c7606e7
rbo commented 7 months ago

Solution was to cleanup all images: sudo podman rmi --all

rbo commented 7 months ago

Looks like cluster is working again.