Improve webhooks deployment model for supporting multi-tenancy

fabriziopandini commented 4 years ago

What steps did you take and what happened:

created a kind cluster

clusterctl init storing all the providers in a single namespace named test

clusterctl init --core cluster-api:v0.3.0 --bootstrap kubeadm-bootstrap:v0.3.0 --infrastructure docker:v0.3.0 --control-plane - --target-namespace test

– Created a workload cluster and got

dockercluster.infrastructure.cluster.x-k8s.io/test created
dockermachine.infrastructure.cluster.x-k8s.io/test-controlplane-0 created
kubeadmconfig.bootstrap.cluster.x-k8s.io/test-controlplane-0 created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/test-worker created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/test-worker created
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "default.cluster.cluster.x-k8s.io": Post https://capi-webhook-service.capi-system.svc:443/mutate-cluster-x-k8s-io-v1alpha3-cluster?timeout=30s: service "capi-webhook-service" not found
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "default.machine.cluster.x-k8s.io": Post https://capi-webhook-service.capi-system.svc:443/mutate-cluster-x-k8s-io-v1alpha3-machine?timeout=30s: service "capi-webhook-service" not found
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "default.machinedeployment.cluster.x-k8s.io": Post https://capi-webhook-service.capi-system.svc:443/mutate-cluster-x-k8s-io-v1alpha3-machinedeployment?timeout=30s: service "capi-webhook-service" not found

What did you expect to happen: No webhook errors

Anything else you would like to add: This is a clusterctl problem; when fixing namespaces in a component YAML, also MutatingWebhookConfiguration.webhooks[].clientConfig.service.Namespace should be fixed

However, I have a doubt @vincepri @ncdc how conversion webhooks should be configured in a multitenant environment? Can the different MutatingWebhookConfigurationconfigurations, one for each instance of the provider, conflict among themselves?

Environment:

Cluster-api version: 8e041b8ba

/kind bug

fabriziopandini commented 4 years ago

/area clusterctl

ncdc commented 4 years ago

@fabriziopandini as far as I know, you can only have a single thing handle mutating/validating/conversion webhooks. I think this is a major step backward from our multi-tenancy model that v1alpha2 allows. We need to discuss options.

One possibility is we say that within a single management cluster, you can only install a single copy of Cluster API and each provider.

Another option, which is pretty ugly, is to deploy pods for a "tenant apiserver" (apiserver, etcd, controller-manager), then deploy the Cluster API pods to the normal management cluster. Configure the Cluster API pods to talk to the "tenant apiserver". The problem with this is that you need to give clients access to the "tenant apiserver", which means you need to solve ingress + security + authentication...

detiber commented 4 years ago

One option that we may have here in a worst case scenario, is for us to create the ability to deploy a controller-manager with only the webhooks enabled (or possibly have the ability to create separate binaries for this purpose)

this would allow for the webhooks to be deployed separately from the multi-tenant controllers
the webhook server deployment would have full rbac to the resources across the cluster
the multi-tenant controller deployments would have limited rbac like today and have the webhooks disabled (or a separate binary that doesn't have them at all)

fabriziopandini commented 4 years ago

@ncdc what about treating webhooks as a global resource, like CRDs?

the first provider instance, create webhooks in a shared namespace (TBD name)
the second provider instance, override webhooks, and so on

Not sure what will happen when a new CRD version gets into the picture

ncdc commented 4 years ago

deploy a controller-manager with only the webhooks enabled

This is my favorite option so far. It would be nice to avoid creating multiple binaries (and container images) if possible.

detiber commented 4 years ago

This is my favorite option so far. It would be nice to avoid creating multiple binaries (and container images) if possible.

I'd kind of like to avoid multiple binaries as well, however from a practical standpoint trying to generate different deployment yaml with different flags was a bit of a pain previously, where we could leverage separate directory structure and separate binaries and generated content to help keep it a bit cleaner.

ncdc commented 4 years ago

I have some ideas on how to organize the kustomize bits to do that, or we could consider using a ConfigMap to configure the managers instead.

fabriziopandini commented 4 years ago

How this will impact clusterctl init UX? It seems to me that we are leaning towards splitting the lifecycle for conversion webhooks from the provider lifecycle

ncdc commented 4 years ago

If we proceed with this approach, I think it means that clusterctl has to keep track of a separate set of things for the webhooks (all webhooks, not just conversion). And this would be per provider (core, CAPA, CAPV, etc).

We should probably start brainstorming now what the upgrade process looks like (#1550).

alexeldeib commented 4 years ago

Cecile shared this issue, could someone help me understand the scope and impact? If I'm reading correctly, this would affect e.g. installing multiple instances of any provider. From @fabriziopandini's comment, it sounds like the issue is matching up the expected service namespace for the webhook configuration.

as far as I know, you can only have a single thing handle mutating/validating/conversion webhooks

@andy I'm not sure I understand this, you could certainly have multiple webhooks acting on one object? It's well-doc'd and even listed as a "gotcha" for mutating webhooks, since you can't guarantee the object you return is the one persisted to storage.

If that's not the issue, where did this issue diverge from the title -- it seems generating proper namespaces and such, while tricky, would solve this quite cleanly? I feel like I must be missing something.

ncdc commented 4 years ago

@alexeldeib my comment above about a single webhook was partially correct and partially incorrect. You can have multiple validating and mutating webhooks per resource, and you can limit the scope of a validating or mutating webhook to specific namespaces and/or specific resources. But you can only have a single conversion webhook per custom resource. That's the real problem.

ncdc commented 4 years ago

See #2279 for a path forward

vincepri commented 4 years ago

/close

k8s-ci-robot commented 4 years ago

@vincepri: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/2275#issuecomment-586357801): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

vincepri commented 4 years ago

Fixed by https://github.com/kubernetes-sigs/cluster-api/pull/2279

binchenX commented 3 years ago

I'm still seeing error:

cluster-api git:(doc) ✗ clusterctl version 
clusterctl version: &version.Info{Major:"0", Minor:"3", GitVersion:"v0.3.14", GitCommit:"5a09f69fa8c4892eb45a61d8d701140eeeaa5ba8", GitTreeState:"clean", BuildDate:"2021-02-06T03:08:03Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

➜  cluster-api git:(doc) ✗ cat init.sh                                                                                                                                                                             
clusterctl init \                                                                                                                                                                                                  
   --core cluster-api:v0.4.0 \                                                                                                                                                                                     
   --bootstrap kubeadm:v0.4.0 \                                                                                                                                                                                    
   --control-plane kubeadm:v0.4.0 \                                                                                                                                                                                
   --infrastructure docker:v0.4.0 \                                                                                                                                                                                
   --config ~/.cluster-api/dev-repository/config.yaml \                                                                                                                                                            
   --v 5                                                                                                                                                                                                           

➜  cluster-api git:(doc) ✗ cat config.sh                                                                                                                                                                           
clusterctl config cluster work-cluster --kubernetes-version 1.17.0 \                                                                                                                                               
    --config ~/.cluster-api/dev-repository/config.yaml \                                                                                                                                                           
    --flavor development         

 ➜  cluster-api git:(doc) ✗ ./config.sh > work-cluster-docker.yaml                                                                                                                                                  
➜  cluster-api git:(doc) ✗ k apply -f work-cluster-docker.yaml                                                                                                                                                     
dockercluster.infrastructure.cluster.x-k8s.io/work-cluster created                                                                                                                                                 
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/work-cluster-md-0 created                                                                                                                                         
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "default.cluster.cluster.x-k8s.io": Post https://capi-webhook-service.capi-syste
m.svc:443/mutate-cluster-x-k8s-io-v1alpha4-cluster?timeout=10s: dial tcp 10.96.19.122:443: connect: connection refused                                                                                             
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "validation.dockermachinetemplate.infrastructure.cluster.x-k8s.io": Post https:/
/capd-webhook-service.capd-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1alpha4-dockermachinetemplate?timeout=10s: dial tcp 10.96.66.204:443: connect: connection refused                              
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "default.kubeadmcontrolplane.controlplane.cluster.x-k8s.io": Post https://capi-k
ubeadm-control-plane-webhook-service.capi-kubeadm-control-plane-system.svc:443/mutate-controlplane-cluster-x-k8s-io-v1alpha4-kubeadmcontrolplane?timeout=10s: dial tcp 10.96.11.187:443: connect: connection refuse
d                                                                                                                                                                                                                  
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "validation.dockermachinetemplate.infrastructure.cluster.x-k8s.io": Post https:/
/capd-webhook-service.capd-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1alpha4-dockermachinetemplate?timeout=10s: dial tcp 10.96.66.204:443: connect: connection refused                              
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "default.machinedeployment.cluster.x-k8s.io": Post https://capi-webhook-service.
capi-system.svc:443/mutate-cluster-x-k8s-io-v1alpha4-machinedeployment?timeout=10s: dial tcp 10.96.19.122:443: connect: connection refused

kubernetes-sigs / cluster-api

Improve webhooks deployment model for supporting multi-tenancy #2275