risingwavelabs / risingwave-docs

The official user documentation of RisingWave
https://docs.risingwave.com
Apache License 2.0
35 stars 30 forks source link

Bug: Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook #791

Closed mneedham closed 1 year ago

mneedham commented 1 year ago

Where's the bug?

https://www.risingwave.dev/docs/current/risingwave-kubernetes

Describe the bug

I'm not sure if this is a bug, but I'm up to the 'Deploy the Operator' step and so far I did this:

$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
configmap/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created

So it seems like the cert manager was installed based on the output.

But then when I try to install the operator I get this output:

$ kubectl apply --server-side -f https://github.com/risingwavelabs/risingwave-operator/releases/latest/download/risingwave-operator.yaml
namespace/risingwave-operator-system serverside-applied
customresourcedefinition.apiextensions.k8s.io/risingwavepodtemplates.risingwave.risingwavelabs.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/risingwaves.risingwave.risingwavelabs.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/risingwavescaleviews.risingwave.risingwavelabs.com serverside-applied
serviceaccount/risingwave-operator-controller-manager serverside-applied
role.rbac.authorization.k8s.io/risingwave-operator-leader-election-role serverside-applied
clusterrole.rbac.authorization.k8s.io/risingwave-operator-manager-role serverside-applied
clusterrole.rbac.authorization.k8s.io/risingwave-operator-metrics-reader serverside-applied
clusterrole.rbac.authorization.k8s.io/risingwave-operator-proxy-role serverside-applied
rolebinding.rbac.authorization.k8s.io/risingwave-operator-leader-election-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/risingwave-operator-manager-rolebinding serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/risingwave-operator-proxy-rolebinding serverside-applied
configmap/risingwave-operator-controller-manager-config serverside-applied
configmap/risingwave-operator-manager-config serverside-applied
service/risingwave-operator-controller-manager-metrics-manager serverside-applied
service/risingwave-operator-metrics serverside-applied
service/risingwave-operator-webhook-service serverside-applied
deployment.apps/risingwave-operator-controller-manager serverside-applied
mutatingwebhookconfiguration.admissionregistration.k8s.io/risingwave-operator-mutating-webhook-configuration serverside-applied
validatingwebhookconfiguration.admissionregistration.k8s.io/risingwave-operator-validating-webhook-configuration serverside-applied
Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.252.113:443: connect: connection refused
Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.252.113:443: connect: connection refused

I think the certmanger webhook container wasn't ready at the time that we tried to call it because when I tried to apply the operator 10 minutes late I don't see any errors and all my containers are running:

$ kubectl -n cert-manager get pods
kubectl -n risingwave-operator-system get pods
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-6ffb79dfdb-2qwp4             1/1     Running   0          10m
cert-manager-cainjector-5fcd49c96-fkvqb   1/1     Running   0          10m
cert-manager-webhook-796ff7697b-tx96c     1/1     Running   0          10m
NAME                                                      READY   STATUS    RESTARTS   AGE
risingwave-operator-controller-manager-78bbdb786c-2nv9c   2/2     Running   0          10m

I'm not knowledgeable enough about K8s operators to know if this is an actual issue, but I wonder if there could be some sort of retry logic if it fails or if not could we wait until the webhook is ready before trying to call it?

arkbriar commented 1 year ago

Hi @mneedham, thanks for the feedback. The problem was caused by an initializing cert-manager. Though the Pods were running, it might take a while before it gets ready. The problem can be simply solved by waiting for another minute and retrying the apply. Would you mind trying it again?

$ kubectl apply --server-side -f https://github.com/risingwavelabs/risingwave-operator/releases/latest/download/risingwave-operator.yaml
hengm3467 commented 1 year ago

We need to add a note to the K8s page about this behavior (there might be a delay)

hengm3467 commented 1 year ago

Steps were updated to mention that users need to wait for one minute to allow for initialization. Details in #895.