Closed gklein closed 5 years ago
This is my cluster status after running 3 days: ''' [chwen@dell-pem630-02 dev-scripts]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master 3d8h v1.13.4+d4ce02c1d master-1 Ready master 3d8h v1.13.4+d4ce02c1d master-2 Ready master 3d8h v1.13.4+d4ce02c1d worker-0 Ready worker 3d7h v1.13.4+d4ce02c1d [chwen@dell-pem630-02 dev-scripts]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.ci-2019-04-17-133604 True False 3d7h Cluster version is 4.0.0-0.ci-2019-04-17-133604 [chwen@dell-pem630-02 dev-scripts]$ git rev-parse HEAD 42c5d2c5ab59b224284591fedef4d767e66f7b11 [chwen@dell-pem630-02 dev-scripts]$ '''
# tail /var/log/pods/f6d7dfce-6bef-11e9-962d-8e7be8ca41be/service-serving-cert-signer-controller/0.log
2019-05-01T09:03:44.402467180+00:00 stderr F I0501 09:03:44.402387 1 option.go:47] ServiceServingCertController: handling add openshift-kube-controller-manager/kube-co
ntroller-manager
2019-05-01T09:03:44.403563838+00:00 stderr F I0501 09:03:44.403437 1 secret_creating_controller.go:124] generating new cert for openshift-kube-controller-manager/kube-
controller-manager
2019-05-01T09:03:44.786892411+00:00 stderr F I0501 09:03:44.786808 1 option.go:47] ServiceServingCertUpdateController: handling add openshift-kube-controller-manager/s
erving-cert
2019-05-01T09:03:44.798663197+00:00 stderr F I0501 09:03:44.797637 1 option.go:55] ServiceServingCertController: handling update openshift-kube-controller-manager/kube
-controller-manager
2019-05-01T09:03:47.671165392+00:00 stderr F I0501 09:03:47.671072 1 option.go:47] ServiceServingCertUpdateController: handling add openshift-kube-scheduler/serving-ce
rt-2
2019-05-01T09:03:56.669922353+00:00 stderr F I0501 09:03:56.669147 1 option.go:47] ServiceServingCertUpdateController: handling add openshift-kube-scheduler/serving-ce
rt-3
2019-05-01T09:04:02.789258907+00:00 stderr F I0501 09:04:02.788728 1 option.go:47] ServiceServingCertUpdateController: handling add openshift-kube-controller-manager/s
erving-cert-1
2019-05-01T09:04:04.273433651+00:00 stderr F I0501 09:04:04.273353 1 option.go:47] ServiceServingCertUpdateController: handling add openshift-kube-scheduler/serving-ce
rt-4
2019-05-01T09:04:49.073812935+00:00 stderr F I0501 09:04:49.071769 1 option.go:47] ServiceServingCertController: handling add openshift-operator-lifecycle-manager/olm-
operators
2019-05-01T09:05:14.988553805+00:00 stderr F I0501 09:05:14.987335 1 option.go:47] ServiceServingCertController: handling add openshift-operator-lifecycle-manager/v1-p
ackages-operators-coreos-com
2019-05-01T09:05:15.406175430+00:00 stderr F I0501 09:05:15.406129 1 option.go:47] ServiceServingCertController: handling add openshift-operator-lifecycle-manager/v1-p
ackages-operators-coreos-com
2019-05-01T09:05:46.093478154+00:00 stderr F I0501 09:05:46.092299 1 leaderelection.go:249] failed to renew lease openshift-service-ca/openshift-service-serving-cert-s
igner-serving-ca-lock: failed to tryAcquireOrRenew context deadline exceeded
2019-05-01T09:05:46.093617452+00:00 stderr F F0501 09:05:46.093468 1 leaderelection.go:65] leaderelection lost
We’ve now rebased to RCs of 4.1.0. Please update if you still see this
I wasn't able to reproduce this issue more than once. I'll reopen in case it appears again.
Describe the bug Both masters and workers moves to "Not ready" state after ~20 hours of running a cluster (Virt)
To Reproduce Install a cluster (Virt) with 3 master and 2 workers 1 extra disk per machine, wait for ~20 hours
Expected/observed behavior
Logs from one of the masters include: