kubernetes-sigs / cluster-api-provider-nested

Cluster API Provider for Nested Clusters
Apache License 2.0
299 stars 65 forks source link

🐛 Webhook caBundle issues for Virtual Cluster #125

Closed vincent-pli closed 3 years ago

vincent-pli commented 3 years ago

Not dig in too much to the code, but in my env, the webhook for VirtualCluster not work. I get this when I try to create VirtualCluster:

Error from server (InternalError): error when creating "virtualcluster_1_nodeport.yaml": Internal error occurred: failed calling webhook "virtualcluster.validating.webhook": Post "https://virtualcluster-webhook-service.vc-manager.svc:9443/validate-tenancy-x-k8s-io-v1alpha1-virtualcluster?timeout=30s": x509: certificate signed by unknown authority

Then i check the ValidatingWebhookConfiguration and there is no caBundle at all in virtualcluster-validating-webhook-configuration.

After I modify the virtualcluster-validating-webhook-configuration and set the caBundle with the cluster's CA, everything works as exppected.

Seems we do not set the caBundle, am i missing something?

christopherhein commented 3 years ago

/kind bug

christopherhein commented 3 years ago

/assign @charleszheng44

charleszheng44 commented 3 years ago

@vincent-pli are you using openshift or native kubernetes?

vincent-pli commented 3 years ago

@charleszheng44 Native kubernetes, actually kind.

christopherhein commented 3 years ago

/retitle 🐛 Webhook caBundle issues for Virtual Cluster

charleszheng44 commented 3 years ago

@vincent-pli May I know which version of kind are you using?

vincent-pli commented 3 years ago

Sure, so you can reproduce the problem I hit? @charleszheng44

root@rentz1:~# kind version
kind v0.10.0 go1.15.2 linux/amd64
charleszheng44 commented 3 years ago

@vincent-pli sorry for the late reply. I run into the same issue when trying to set up VC on the Kind cluster, looks like the certificate assigned to the webhook does not work properly.

However, I can successfully set up the VC framework and create VC on Minikube, so this issue may be a Kind-specific issue. I will try to find out the cause. But at the same time, could you try out Minikube or other testing environments?

The Minikube version I used is 1.20.0, which uses the same version(v1.20) of Kubernetes as kind v0.10.0.

vincent-pli commented 3 years ago

That's weird, I check some code, here: https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/2e2add9bba1ec0c5104df0f64ce3c560f625bef8/virtualcluster/pkg/webhook/virtualcluster/virtualcluster_webhook.go#L155-L162

we do not set caBundle when creating and neither update it after creation. So there is no chance to add a caBundle to the ValidatingWebhookConfiguration

for miniKube, i will take a try but I guess there should be some certification injection feature like cert-manager or some thing help to inject the caBundle @charleszheng44

charleszheng44 commented 3 years ago

@vincent-pli This is intentional. If the caBundle is not specified, the system trusted CAs will be used. The details can be found in the definition of WebhookClientConfig. I guess the system-trusted CAs on the Kind cluster is somehow different from the Kubernetes cluster with physical nodes.

vincent-pli commented 3 years ago

If unspecified, system trust roots on the apiserver are used.

It's means if caBundle unspecified the system trust roots will be used to validate the webhook's certificate。

But the certificate from our webhook is signed by CSR API, the CSR API signed certificate is not a system trust root.

Please help to check if the caBundle field is injected in minikube env, thanks @charleszheng44

charleszheng44 commented 3 years ago

Please correct me if I am wrong. My understanding is the WebhookClientConfig here is used by APIServer to set up a connection between itself and the webhook. When talking to the webhook, the APIServer will act as a client and the caBundle is used to authenticate the response sent back from the webhook. The system trust roots on APIServer are the CAs loaded by APIServer during the start time and the CA used to sign the CSR is one of them.

Please help to check if the caBundle field is injected in minikube env, thanks

Did you mean the caBundle is injected in the APIServer pod or the node running APIServer?

vincent-pli commented 3 years ago

Thanks @charleszheng44 I think all your presentation are correct except one thing:

the CA used to sign the CSR is one of them

I'm not expert in this area, but I guess the system trust maybe means Operator system trust, I mean for ubuntu these CA is local in path: /usr/local/share/ca-certificates/ and I notice pod of kube-apiserver mount the path as hostpath volume from the node.

I want to say again, I'm not expert but I'm happy to figure it out, thanks @charleszheng44

and I found one issue about openshift, seems they are talking the same thing with us, please take a look: https://bugzilla.redhat.com/show_bug.cgi?id=1960936

charleszheng44 commented 3 years ago

@vincent-pli Thanks for pointing me to the OpenShift issue. Looks like the CA used to sign the CSR is not one of the system trust roots (my fault 😅).

There are two options to resolve this issue, we can either leverage external components, like cert-manager, or run the webhook server pod with an init-container that generates a self-signing certificate and store the CA to the caBundle of the WebhookConfiguration later.

The cer-manager itself is a large application including many crds. In our case, there is only one webhook and I may go with the second option. I will try to implement it next week. At meanwhile, could you temporarily use Minikube for testing, or hack the code by adding the serviceaccount CA to the caBundle (I tried this before on Kind and it worked. )?

vincent-pli commented 3 years ago

Thanks @charleszheng44 Expect to see your implements : )