okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.72k stars 295 forks source link

MachineWithNoRunningPhase and MachineWithoutValidNode critical errors just after installation #234

Closed viniciusferrao closed 3 years ago

viniciusferrao commented 4 years ago

Describe the bug I'm receiving multiple errors just after OKD installation, there are three times, one for each worker: MachineWithNoRunningPhase MachineWithoutValidNode

And after a while the following errors appears: ClusterOperatorDown ClusterOperatorDegraded

image

Version okd:4.5.0-0.okd-2020-06-29-110348-beta6

How reproducible

100%

Log bundle

I tried to got the output from the command, but something is not right with the certificates (I think):

[ferrao@fedora-test okd]$ oc --kubeconfig=auth/kubeconfig adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 [must-gather ] OUT namespace/openshift-must-gather-75nx8 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-w95dd created [must-gather ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 created [must-gather-mj4v5] OUT gather logs unavailable: Get https://146.164.29.248:10250/containerLogs/openshift-must-gather-75nx8/must-gather-mj4v5/gather?follow=true: remote error: tls: internal error [must-gather-mj4v5] OUT waiting for gather to complete [must-gather-mj4v5] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-w95dd deleted [must-gather ] OUT namespace/openshift-must-gather-75nx8 deleted error: gather never finished for pod must-gather-mj4v5: timed out waiting for the condition

vrutkovs commented 4 years ago

Interesting, could you try must-gather a few times? It might be a transient error.

It appears there are two problems here:

rgolangh commented 4 years ago

https://bugzilla.redhat.com/show_bug.cgi?id=1817853

Any more info here will help

rgolangh commented 4 years ago

The bug I mentioned is not necessarily related. We are seeing some CSR not getting approved. @vrutkovs have you seen this problem before?

cgruver commented 4 years ago

Is this an IPI install on oVirt?

viniciusferrao commented 4 years ago

@vrutkovs I've run oc adm must-gather three times and it had the same results:

[ferrao@fedora-test okd]$ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 [must-gather ] OUT namespace/openshift-must-gather-bp6r5 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-j6snv created [must-gather ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 created [must-gather-mzklx] OUT gather logs unavailable: Get https://146.164.29.248:10250/containerLogs/openshift-must-gather-bp6r5/must-gather-mzklx/gather?follow=true: remote error: tls: internal error [must-gather-mzklx] OUT waiting for gather to complete [must-gather-mzklx] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-j6snv deleted [must-gather ] OUT namespace/openshift-must-gather-bp6r5 deleted error: gather never finished for pod must-gather-mzklx: timed out waiting for the condition [ferrao@fedora-test okd]$ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 [must-gather ] OUT namespace/openshift-must-gather-qnntk created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-27v9m created [must-gather ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 created [must-gather-ggkn7] OUT gather logs unavailable: Get https://146.164.29.248:10250/containerLogs/openshift-must-gather-qnntk/must-gather-ggkn7/gather?follow=true: remote error: tls: internal error [must-gather-ggkn7] OUT waiting for gather to complete [must-gather-ggkn7] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-27v9m deleted error: gather never finished for pod must-gather-ggkn7: timed out waiting for the condition [ferrao@fedora-test okd]$ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 [must-gather ] OUT namespace/openshift-must-gather-6lbc5 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-2q7sj created [must-gather ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 created [must-gather-g8hdm] OUT gather logs unavailable: Get https://146.164.29.248:10250/containerLogs/openshift-must-gather-6lbc5/must-gather-g8hdm/gather?follow=true: remote error: tls: internal error [must-gather-g8hdm] OUT waiting for gather to complete [must-gather-g8hdm] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-2q7sj deleted [must-gather ] OUT namespace/openshift-must-gather-6lbc5 deleted error: gather never finished for pod must-gather-g8hdm: timed out waiting for the condition

@rgolangh I tried some commands from this bugzilla and in fact the worker nodes are on Provisioned phase:

[ferrao@fedora-test okd]$ oc get nodes NAME STATUS ROLES AGE VERSION okd-kw67j-master-0 Ready master 20h v1.18.3 okd-kw67j-master-1 Ready master 20h v1.18.3 okd-kw67j-master-2 Ready master 20h v1.18.3 okd-kw67j-worker-0-6mdsj Ready worker 20h v1.18.3 okd-kw67j-worker-0-pr4nd Ready worker 20h v1.18.3 okd-kw67j-worker-0-qnp79 Ready worker 20h v1.18.3 [ferrao@fedora-test okd]$ oc -n openshift-machine-api get machine NAME PHASE TYPE REGION ZONE AGE okd-kw67j-master-0 Running 20h okd-kw67j-master-1 Running 20h okd-kw67j-master-2 Running 20h okd-kw67j-worker-0-6mdsj Provisioned 20h okd-kw67j-worker-0-pr4nd Provisioned 20h okd-kw67j-worker-0-qnp79 Provisioned 20h [ferrao@fedora-test okd]$ oc get machineset No resources found in default namespace. [ferrao@fedora-test okd]$ oc get machine okd-kw67j-worker-0-6mdsj -o yaml Error from server (NotFound): machines.machine.openshift.io "okd-kw67j-worker-0-6mdsj" not found [ferrao@fedora-test okd]$ oc get machine okd-kw67j-worker-0-pr4nd -o yaml Error from server (NotFound): machines.machine.openshift.io "okd-kw67j-worker-0-pr4nd" not found [ferrao@fedora-test okd]$ oc get machine okd-kw67j-worker-0-qnp79 -o yaml Error from server (NotFound): machines.machine.openshift.io "okd-kw67j-worker-0-qnp79" not found

@cgruver I'm not sure what is IPI, but this Cluster was installed using the oVirt workflow. The target machine is RHV 4.3.10, it's the same as oVirt as you know, but just to make things clear.

viniciusferrao commented 4 years ago

Regarding the CSR comment, I've a lot of pendings with CSR:

[ferrao@fedora-test okd]$ oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-22bjd 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-2c4xz 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-2t52f 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-2t7fm 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-2wfg5 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-2wjqw 43m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-45dgd 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-45pkg 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-46tgm 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-46zqg 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-4lnzn 5h19m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-4qx5w 3h48m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-4r8r4 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-4xgsk 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-558ln 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-5dsmb 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-5htgb 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-5lkhw 74m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-5ns2m 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-5phfn 5h50m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-5qwqm 7h40m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-5wm5v 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-66w75 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-686qn 104m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-688w6 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-6968h 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-6cmwd 4h35m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-6dn7n 121m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-6jkfb 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-6jt9p 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-6kbk2 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-6nzdt 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-6wcjn 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-77c7m 10m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-795kd 7h10m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-7gzwl 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-7hc2v 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-7kqfj 4h51m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-7l6t5 13m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-7ll2r 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-7mhr6 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-7ml2q 3h47m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-7qfh5 5h37m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-7qjs5 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-7v2d6 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-85v94 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-8crls 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-8j45c 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-8k9rg 105m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-8rh58 4h19m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-8sxxn 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-8tzbb 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-8wr5v 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-8wvkb 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-94479 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-9bms8 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-9k7bz 5h52m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-9kwdd 3h3m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-9q56z 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-9rqxl 6h7m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-b2jc2 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-b64b6 6h24m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-bbxg7 27m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-bc2m8 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-bh2tb 3h31m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-bmmjr 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-bpglh 149m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-bz5bx 7h57m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-c97g2 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-c9drt 136m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-c9tlm 3h16m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-chnvn 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-chq6k 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-cpgdq 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-cq8rs 6h21m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-ctxcf 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-d5bk8 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-d7h2c 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-d8gwz 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-df7xj 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-dh99j 56m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-dhgs8 58m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-dn6nl 7h26m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-dqb6g 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-dqcsz 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-dtprr 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-dxpwq 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-f529t 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-f6t2s 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-fddsw 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-fddsz 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-ffmz2 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-fht54 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-fjgwm 3h18m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-fjvsp 6h37m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-fwmtt 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-fzf7p 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-g29lb 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-g2t5w 4h50m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-g65lr 151m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-gg45s 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-gkc7c 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-gz55b 6h52m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-h55dr 72m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-h5spj 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-h7mch 87m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-h879c 7h9m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-hcnv5 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-hd8hv 4h4m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-hhl9p 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-htdq4 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-hwmfp 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-hxw69 166m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-j2hw8 4h48m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-j2zlq 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-j82p8 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-jbq6h 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-jcwll 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-jf6jg 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-jf9tg 28m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-jfb29 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-jm9d5 7h25m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-jpwdp 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-jszcf 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-jv2dm 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-k4hr9 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-k4j8h 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-k7glr 3h33m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-kfmnp 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-kj7xp 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-knnx9 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-kp9bw 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-kpkkr 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-kpprs 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-kqhjb 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-ktwhl 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-kvwh2 7h41m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-l8q9k 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-lc6qt 6h38m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-lg2q9 6h55m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-ljbwg 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-lll4n 89m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-lmx8k 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-lnx4k 4h17m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-lp4gr 59m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-lpqxw 7h8m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-lrjpp 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-lwr9r 25m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-lwrxl 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-m5mq9 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-mjv8q 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-mk586 10h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-mpwsh 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-mq5wq 165m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-mt9zl 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-mxjpb 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-mzl6m 3h34m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-ndtwb 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-ng5b2 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-nk9lz 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-nm65f 135m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-nqk5f 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-nxf98 167m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-p42nd 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-p55vc 5h6m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-p5d7s 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-p6whg 65m kubernetes.io/kube-apiserver-client-kubelet system:node:okd-kw67j-master-2 Approved,Issued csr-p9fcr 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-p9xrb 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-pd4l5 5h6m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-pgjjk 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-pgzlq 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-pjwdn 5h36m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-pltcl 103m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-pm4cr 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-pmws9 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-psflj 32m kubernetes.io/kube-apiserver-client-kubelet system:node:okd-kw67j-worker-0-pr4nd Approved,Issued csr-q5jlj 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-q5xwl 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-q7hl4 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-qdxnd 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-qgq64 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-qnld5 7h56m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-qnq4v 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-qqd9c 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-qtr8f 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-qvrmk 6h8m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-qvrwp 7h54m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-r45zb 4h33m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-r9f86 12m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-rbfsn 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-rd475 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-rh7vv 78m kubernetes.io/kubelet-serving system:node:okd-kw67j-master-0 Approved,Issued csr-rjbhg 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-rlh7h 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-rmrt4 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-rp7fd 4h35m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-rptz8 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-rrpfw 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-rrqhp 41m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-rtl96 134m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-s4k8h 6h54m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-s6kvv 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-s9q9m 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-sbp6g 3h49m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-slg5w 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-sn86l 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-sqsnr 90m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-stw62 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-svcrg 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-t6bww 120m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-tfghk 7h39m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-tl442 17h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-tnbx5 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-tngnt 5h4m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-tsnt9 6h23m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-tv4rj 118m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-v4swr 6h39m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-v5lwg 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-v6h4f 3h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-v7dhq 13h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-v945c 5h22m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-vc2tt 152m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-vj7x8 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-vk872 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-vn77q 43m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-vqk87 6h6m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-vs5vh 7h23m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-vtxqz 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-vv2q6 18h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-w2c6s 20h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-w2qqt 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-w5r5d 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-w99br 4h20m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-w99qv 4h5m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-wcfxs 3h2m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-wjcmv 12h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-wjknr 21h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-wjzw5 9h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-wnzdk 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-wqrbk 5h53m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-wqtp8 16h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-wvld5 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-wzzb5 4h2m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-x7psd 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-x8m8j 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-xcd7z 14h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-xhnd7 5h35m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-xkl6h 11h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-xslnh 74m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-xzzkh 15h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-qnp79 Pending csr-z2scs 8h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-zmhlc 19h kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-pr4nd Pending csr-znbsm 3h17m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending csr-zpwcm 5h21m kubernetes.io/kubelet-serving system:node:okd-kw67j-worker-0-6mdsj Pending

There are only two with issued/approved status.

cgruver commented 4 years ago

If you run the following:

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

Do your worker nodes successfully join the cluster?

You need to have the jq utility installed in addition to oc.

viniciusferrao commented 4 years ago

@cgruver done; without the jq:

for x in `oc get csr | grep Pending | cut -f 1 -d " "` ; do oc adm certificate approve $x ; done

Haha. Should I restart something now? Things went strange after approving everything. For instance oc adm must-gather handled two different outputs now:

[ferrao@fedora-test okd]$ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 [must-gather ] OUT namespace/openshift-must-gather-rvxgf created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-zlqfv created Delete https://api.okd.iq.ufrj.br:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/must-gather-zlqfv: read tcp 146.164.29.89:56324->146.164.29.21:6443: read: connection reset by peer[must-gather ] OUT namespace/openshift-must-gather-rvxgf deleted Error from server (Forbidden): pods "must-gather-" is forbidden: error looking up service account openshift-must-gather-rvxgf/default: serviceaccount "default" not found

and:

[ferrao@fedora-test okd]$ oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 [must-gather ] OUT namespace/openshift-must-gather-qhnmr created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-bzv49 created [must-gather ] OUT pod for plug-in image quay.io/openshift/okd-content@sha256:ef4afb2bc8d802294401b3a112b49d4f5b5727498e2a75f2181f612168ba8472 created [must-gather-hpcx6] POD Wrote inspect data to must-gather. [must-gather-hpcx6] POD Gathering data for ns/openshift-cluster-version... [must-gather-hpcx6] OUT gather logs unavailable: read tcp 146.164.29.89:56330->146.164.29.21:6443: read: connection reset by peer [must-gather-hpcx6] OUT waiting for gather to complete [must-gather-hpcx6] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-bzv49 deleted [must-gather ] OUT namespace/openshift-must-gather-qhnmr deleted error: gather never finished for pod must-gather-hpcx6: timed out waiting for the condition

Regarding the image registry it's still degraded:

[ferrao@fedora-test okd]$ oc get clusteroperator | grep image-registry image-registry 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 22h

Worker node are still provisioned:

[ferrao@fedora-test okd]$ oc -n openshift-machine-api get machine NAME PHASE TYPE REGION ZONE AGE okd-kw67j-master-0 Running 22h okd-kw67j-master-1 Running 22h okd-kw67j-master-2 Running 22h okd-kw67j-worker-0-6mdsj Provisioned 22h okd-kw67j-worker-0-pr4nd Provisioned 22h okd-kw67j-worker-0-qnp79 Provisioned 22h

So should I restart something?

cgruver commented 4 years ago

What is the output of oc get nodes?

viniciusferrao commented 4 years ago

They appear to be flapping. It was with:

[ferrao@fedora-test okd]$ oc get nodes NAME STATUS ROLES AGE VERSION okd-kw67j-master-0 Ready master 22h v1.18.3 okd-kw67j-master-1 Ready master 22h v1.18.3 okd-kw67j-master-2 Ready master 22h v1.18.3 okd-kw67j-worker-0-6mdsj Ready worker 22h v1.18.3 okd-kw67j-worker-0-pr4nd Ready worker 22h v1.18.3 okd-kw67j-worker-0-qnp79 Ready worker 22h v1.18.3

But now they are:

[ferrao@fedora-test okd]$ oc get nodes NAME STATUS ROLES AGE VERSION okd-kw67j-master-0 Ready master 23h v1.18.3 okd-kw67j-master-1 NotReady master 23h v1.18.3 okd-kw67j-master-2 Ready master 23h v1.18.3 okd-kw67j-worker-0-6mdsj Ready worker 22h v1.18.3 okd-kw67j-worker-0-pr4nd NotReady worker 22h v1.18.3 okd-kw67j-worker-0-qnp79 Ready worker 22h v1.18.3

Another issue, things are started to move from OK to degraded, after the mass CSR approval.

[ferrao@fedora-test okd]$ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 22h cloud-credential 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h cluster-autoscaler 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h config-operator 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h console 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 41m csi-snapshot-controller 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 42m dns 4.5.0-0.okd-2020-06-29-110348-beta6 True True True 23h etcd 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 23h image-registry 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 23h ingress 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h insights 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h kube-apiserver 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 23h kube-controller-manager 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 23h kube-scheduler 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 23h kube-storage-version-migrator 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 39m machine-api 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h machine-approver 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h machine-config 4.5.0-0.okd-2020-06-29-110348-beta6 False False True 44m marketplace 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h monitoring 4.5.0-0.okd-2020-06-29-110348-beta6 False True True 37m network 4.5.0-0.okd-2020-06-29-110348-beta6 True True True 23h node-tuning 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h openshift-apiserver 4.5.0-0.okd-2020-06-29-110348-beta6 True False True 44m openshift-controller-manager 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h openshift-samples 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h operator-lifecycle-manager 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h operator-lifecycle-manager-catalog 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h operator-lifecycle-manager-packageserver 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 44m service-ca 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h storage 4.5.0-0.okd-2020-06-29-110348-beta6 True False False 23h

viniciusferrao commented 4 years ago

Another thing, oc adm must-gather now works. It generated a folder with 778MB: https://pessoas.iq.ufrj.br/~ferrao/okd/must-gather-okd-4.5.0-beta6-20200630.tar.gz

EDIT: I think there's something related with the monitoring operator (is operator the right nomenclature?)

[ferrao@fedora-test okd]$ oc describe co/monitoring Name: monitoring Namespace:
Labels: Annotations: API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-06-29T22:57:38Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:status: .: f:extension: Manager: cluster-version-operator Operation: Update Time: 2020-06-29T22:57:38Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:relatedObjects: f:versions: Manager: operator Operation: Update Time: 2020-07-01T01:21:01Z Resource Version: 501392 Self Link: /apis/config.openshift.io/v1/clusteroperators/monitoring UID: 837de426-f4c9-4354-ba84-10cf6ca1507e Spec: Status: Conditions: Last Transition Time: 2020-06-30T21:45:47Z Status: False Type: Available Last Transition Time: 2020-07-01T01:21:01Z Message: Rolling out the stack. Reason: RollOutInProgress Status: True Type: Progressing Last Transition Time: 2020-06-30T21:45:47Z Message: Failed to rollout the stack. Error: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of node-exporter: daemonset node-exporter is not ready. status: (desired: 6, updated: 6, ready: 4, unavailable: 2) Reason: UpdatingnodeExporterFailed Status: True Type: Degraded Last Transition Time: 2020-07-01T01:21:01Z Message: Rollout of the monitoring stack is in progress. Please wait until it finishes. Reason: RollOutInProgress Status: True Type: Upgradeable Extension: Related Objects: Group:
Name: openshift-monitoring Resource: namespaces Group: monitoring.coreos.com Name:
Resource: servicemonitors Group: monitoring.coreos.com Name:
Resource: prometheusrules Group: monitoring.coreos.com Name:
Resource: alertmanagers Group: monitoring.coreos.com Name:
Resource: prometheuses Versions: Name: operator Version: 4.5.0-0.okd-2020-06-29-110348-beta6 Events:

vrutkovs commented 4 years ago

okd-kw67j-master-1 NotReady master 23h v1.18.3

This is pretty bad. Its status is Kubelet stopped posting node status., so we can't fetch logs from it.

Could you ssh on that machine (use core user and the key from install-config) and check kubelet service - is it running? What are the latest logs there?

viniciusferrao commented 4 years ago

Hi @vrutkovs, sorry for the delay, we're in different timezones!

The service is running:

[core@okd-kw67j-master-1 ~]$ systemctl status kubelet ● kubelet.service - MCO environment configuration Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-mco-default-env.conf Active: active (running) since Mon 2020-06-29 23:00:16 UTC; 1 day 18h ago Process: 1287 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Process: 1288 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS) Main PID: 1289 (kubelet) Tasks: 25 (limit: 19063) Memory: 279.3M CPU: 9h 56min 35.787s CGroup: /system.slice/kubelet.service └─1289 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/syste>

Regarding the logs, there's something that I should specifically look? From systemctl I got this:

Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.225509 1289 secret.go:183] Setting up volume multus-token-qd5ql for pod f96a717a-ef33-4cc2-ac7a-d8d89b0452f5 at /var/lib/kubelet/pods/f96a717a-ef33-4cc2-ac7a-d8d89b0452f5/volumes/kubernetes.io~secret/multus-token-qd5ql Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.225998 1289 secret.go:207] Received secret openshift-multus/multus-token-qd5ql containing (4) pieces of data, 16397 total bytes Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.226191 1289 atomic_writer.go:157] pod openshift-ovn-kubernetes/ovnkube-master-fjv8v volume ovn-cert: no update required for target directory /var/lib/kubelet/pods/6ca5c065-4244-4d89-b542-b3e877ae6119/volumes/kubernetes.io~secret/ovn-cert Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.226232 1289 operation_generator.go:657] MountVolume.SetUp succeeded for volume "ovn-cert" (UniqueName: "kubernetes.io/secret/6ca5c065-4244-4d89-b542-b3e877ae6119-ovn-cert") pod "ovnkube-master-fjv8v" (UID: "6ca5c065-4244-4d89-b542-b3e877ae6119") Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.226601 1289 atomic_writer.go:157] pod openshift-multus/multus-k2kvq volume multus-token-qd5ql: no update required for target directory /var/lib/kubelet/pods/f96a717a-ef33-4cc2-ac7a-d8d89b0452f5/volumes/kubernetes.io~secret/multus-token-qd5ql Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.226636 1289 operation_generator.go:657] MountVolume.SetUp succeeded for volume "multus-token-qd5ql" (UniqueName: "kubernetes.io/secret/f96a717a-ef33-4cc2-ac7a-d8d89b0452f5-multus-token-qd5ql") pod "multus-k2kvq" (UID: "f96a717a-ef33-4cc2-ac7a-d8d89b0452f5") Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.227327 1289 atomic_writer.go:157] pod openshift-ovn-kubernetes/ovnkube-master-fjv8v volume ovn-kubernetes-controller-token-td2tr: no update required for target directory /var/lib/kubelet/pods/6ca5c065-4244-4d89-b542-b3e877ae6119/volumes/kubernetes.io~secret/ovn-kubernetes-controller-token-td2tr Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.227364 1289 operation_generator.go:657] MountVolume.SetUp succeeded for volume "ovn-kubernetes-controller-token-td2tr" (UniqueName: "kubernetes.io/secret/6ca5c065-4244-4d89-b542-b3e877ae6119-ovn-kubernetes-controller-token-td2tr") pod "ovnkube-master-fjv8v" (UID: "6ca5c065-4244-4d89-b542-b3e877ae6119") Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.231940 1289 request.go:907] Got a Retry-After 1s response for attempt 9 to https://api-int.okd.iq.ufrj.br:6443/api/v1/namespaces/openshift-monitoring/secrets?fieldSelector=metadata.name%3Dnode-exporter-dockercfg-k7x2w&resourceVersion=430234 Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.279494 1289 request.go:907] Got a Retry-After 1s response for attempt 7 to https://api-int.okd.iq.ufrj.br:6443/api/v1/namespaces/openshift-dns/configmaps?fieldSelector=metadata.name%3Ddns-default&resourceVersion=430602 Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.360431 1289 reflector.go:211] Listing and watching *v1.Secret from object-"openshift-image-registry"/"node-ca-token-822cl" Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.361453 1289 request.go:907] Got a Retry-After 1s response for attempt 1 to https://api-int.okd.iq.ufrj.br:6443/api/v1/namespaces/openshift-image-registry/secrets?fieldSelector=metadata.name%3Dnode-ca-token-822cl&resourceVersion=430234 Jul 01 17:56:19 okd-kw67j-master-1 hyperkube[1289]: I0701 17:56:19.368403 1289 request.go:907] Got a Retry-After 1s response for attempt 6 to https://api-int.okd.iq.ufrj.br:6443/api/v1/namespaces/openshift-dns/secrets?fieldSelector=metadata.name%3Ddns-token-wls97&resourceVersion=430234

May I ask another question, I'm not sure if it's related or not: Masters and Workers get IP addresses from DHCP. Since there's no DHCP reservation during the openshift-install phase, in my infrastructure, the DHCP does not register the names on the DNS server. The registration of the master and worker nodes are necessary? It may cause random issues like this one? If yes: there's a way to change de install process so I can have control of the MAC Addresses and do proper reservation (static leases) on DHCP and register the names on DNS?

Not sure if it helps, but those links on the kubelet logs are public available.

Thanks,

vrutkovs commented 4 years ago

So, its running but won't join the cluster.

Regarding the logs, there's something that I should specifically look?

I don't know, could you dump last couple of hours and attach here?

Masters and Workers get IP addresses from DHCP. Since there's no DHCP reservation during the openshift-install phase, in my infrastructure, the DHCP does not register the names on the DNS server. The registration of the master and worker nodes are necessary? It may cause random issues like this one?

I think that is possible. Some cluster features assume nodes can be resolved by short names, but its strictly not necessary. Stale DHCP leases though may cause an issue.

If yes: there's a way to change de install process so I can have control of the MAC Addresses and do proper reservation (static leases) on DHCP and register the names on DNS?

Not that I know of, maybe @rgolangh has ideas

vrutkovs commented 4 years ago

Is this still an issue in GA release?

viniciusferrao commented 4 years ago

Yes. Appears to be the same issue on GA. I've destroyed the beta cluster yesterday and went to bed while installing a new one with the GA release. image

Regarding the "couple of hours" dump, from where do you want the dump?

MJularic commented 4 years ago

Hi guys,

the MachineWithoutValidNode and MachineWithoutNoRunningPhase error is occuring on the latest release of OKD.

Here is the setup that is used:

Provider: Red Hat Openstack 16.1 (without Kuryr)

Installation procedure: Installer-Provisioned-Infrastructure (IPI)

OKD installer version: 4.5.0-0.okd-2020-08-12-020541

Should I open another issue or should I post the must-gather outputs here?

viniciusferrao commented 4 years ago

@vrutkovs The issue appears to be solved with the latest OKD release: 4.5.0-0.okd-2020-09-18-202631

Finally the cluster is up and running without issues. So this ticket can be closed.

Thanks,

PS: There's a quickstart guide for OKD/OpenShift on how to configure things? Like permanent storage, network and of course the first steps on the platform?

vrutkovs commented 3 years ago

PS: There's a quickstart guide for OKD/OpenShift on how to configure things? Like permanent storage, network and of course the first steps on the platform?

https://openshift.tips and learn.openshift.com should be helpful