Open Smityz opened 10 months ago
as https://github.com/openshift/generic-admission-server/issues/33#issuecomment-620513624 said, in k8s 1.18, k8s.io/apiserver supports reload of the serving certs.
TiDB Operator v1.4.4 has been using v1.19 of K8s (https://github.com/pingcap/tidb-operator/blob/v1.4.4/go.mod#L65), and this version of generic-admission-server also using k8s v1.19 (https://github.com/openshift/generic-admission-server/blob/da96454c926de350e52f6c7a6ee86af49ee96b00/go.mod), it should reload the certs.
Did your cert just expire or renew after expired?
that's not the certs of tidb-webhook expired, but the CA of "kuberntes.default.svc" in the k8s apiserver is.
because the call flow of tidb crd adminssion is
k8s apiserver -> apiservice (kuberntes.default.svc) -> tidb webhook pod
i.e.
k8s apiserver -> k8s apiserver (kuberntes.default.svc) -> tidb webhook pod
when a k8s apiserver runs for more that one year and doesn't restart, the CA of kuberntes.default.svc in the k8s apiserver memory will expire. As a result, the k8s apiserver accessing the k8s apiserver itself will fail after a year in this case.
by default the CA of kuberntes.default.svc in k8s apiserver memory is self-signed for one year during k8s apiserver starting.
@Smityz is this caused as iPenx said? Have you resolved it?
@Smityz is this caused as iPenx said? Have you resolved it?
Yes, we are in the same team. We disable webhook finally, but I think it's a common problem and it needs to be solve.
Bug Report
What version of Kubernetes are you using?
v1.22
What version of TiDB Operator are you using?
v1.4.4
What did you do? After running stably for several months, the operator suddenly keeps reporting errors and cannot complete sync, after disable the webhook , the operator returned to normal. Related error log:
We speculate that this may be related to the self-signed mechanism of the api-server, because the expiration time of the certificate happens to be one year after the api server starts. And we also found related bug here https://github.com/openshift/generic-admission-server/issues/33