Closed fabriziopandini closed 4 years ago
/area clusterctl
@fabriziopandini as far as I know, you can only have a single thing handle mutating/validating/conversion webhooks. I think this is a major step backward from our multi-tenancy model that v1alpha2 allows. We need to discuss options.
One possibility is we say that within a single management cluster, you can only install a single copy of Cluster API and each provider.
Another option, which is pretty ugly, is to deploy pods for a "tenant apiserver" (apiserver, etcd, controller-manager), then deploy the Cluster API pods to the normal management cluster. Configure the Cluster API pods to talk to the "tenant apiserver". The problem with this is that you need to give clients access to the "tenant apiserver", which means you need to solve ingress + security + authentication...
One option that we may have here in a worst case scenario, is for us to create the ability to deploy a controller-manager with only the webhooks enabled (or possibly have the ability to create separate binaries for this purpose)
@ncdc what about treating webhooks as a global resource, like CRDs?
Not sure what will happen when a new CRD version gets into the picture
deploy a controller-manager with only the webhooks enabled
This is my favorite option so far. It would be nice to avoid creating multiple binaries (and container images) if possible.
This is my favorite option so far. It would be nice to avoid creating multiple binaries (and container images) if possible.
I'd kind of like to avoid multiple binaries as well, however from a practical standpoint trying to generate different deployment yaml with different flags was a bit of a pain previously, where we could leverage separate directory structure and separate binaries and generated content to help keep it a bit cleaner.
I have some ideas on how to organize the kustomize bits to do that, or we could consider using a ConfigMap to configure the managers instead.
How this will impact clusterctl init
UX?
It seems to me that we are leaning towards splitting the lifecycle for conversion webhooks from the provider lifecycle
If we proceed with this approach, I think it means that clusterctl has to keep track of a separate set of things for the webhooks (all webhooks, not just conversion). And this would be per provider (core, CAPA, CAPV, etc).
We should probably start brainstorming now what the upgrade process looks like (#1550).
Cecile shared this issue, could someone help me understand the scope and impact? If I'm reading correctly, this would affect e.g. installing multiple instances of any provider. From @fabriziopandini's comment, it sounds like the issue is matching up the expected service namespace for the webhook configuration.
as far as I know, you can only have a single thing handle mutating/validating/conversion webhooks
@andy I'm not sure I understand this, you could certainly have multiple webhooks acting on one object? It's well-doc'd and even listed as a "gotcha" for mutating webhooks, since you can't guarantee the object you return is the one persisted to storage.
If that's not the issue, where did this issue diverge from the title -- it seems generating proper namespaces and such, while tricky, would solve this quite cleanly? I feel like I must be missing something.
@alexeldeib my comment above about a single webhook was partially correct and partially incorrect. You can have multiple validating and mutating webhooks per resource, and you can limit the scope of a validating or mutating webhook to specific namespaces and/or specific resources. But you can only have a single conversion webhook per custom resource. That's the real problem.
See #2279 for a path forward
/close
@vincepri: Closing this issue.
I'm still seeing error:
cluster-api git:(doc) ✗ clusterctl version
clusterctl version: &version.Info{Major:"0", Minor:"3", GitVersion:"v0.3.14", GitCommit:"5a09f69fa8c4892eb45a61d8d701140eeeaa5ba8", GitTreeState:"clean", BuildDate:"2021-02-06T03:08:03Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
➜ cluster-api git:(doc) ✗ cat init.sh
clusterctl init \
--core cluster-api:v0.4.0 \
--bootstrap kubeadm:v0.4.0 \
--control-plane kubeadm:v0.4.0 \
--infrastructure docker:v0.4.0 \
--config ~/.cluster-api/dev-repository/config.yaml \
--v 5
➜ cluster-api git:(doc) ✗ cat config.sh
clusterctl config cluster work-cluster --kubernetes-version 1.17.0 \
--config ~/.cluster-api/dev-repository/config.yaml \
--flavor development
➜ cluster-api git:(doc) ✗ ./config.sh > work-cluster-docker.yaml
➜ cluster-api git:(doc) ✗ k apply -f work-cluster-docker.yaml
dockercluster.infrastructure.cluster.x-k8s.io/work-cluster created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/work-cluster-md-0 created
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "default.cluster.cluster.x-k8s.io": Post https://capi-webhook-service.capi-syste
m.svc:443/mutate-cluster-x-k8s-io-v1alpha4-cluster?timeout=10s: dial tcp 10.96.19.122:443: connect: connection refused
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "validation.dockermachinetemplate.infrastructure.cluster.x-k8s.io": Post https:/
/capd-webhook-service.capd-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1alpha4-dockermachinetemplate?timeout=10s: dial tcp 10.96.66.204:443: connect: connection refused
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "default.kubeadmcontrolplane.controlplane.cluster.x-k8s.io": Post https://capi-k
ubeadm-control-plane-webhook-service.capi-kubeadm-control-plane-system.svc:443/mutate-controlplane-cluster-x-k8s-io-v1alpha4-kubeadmcontrolplane?timeout=10s: dial tcp 10.96.11.187:443: connect: connection refuse
d
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "validation.dockermachinetemplate.infrastructure.cluster.x-k8s.io": Post https:/
/capd-webhook-service.capd-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1alpha4-dockermachinetemplate?timeout=10s: dial tcp 10.96.66.204:443: connect: connection refused
Error from server (InternalError): error when creating "work-cluster-docker.yaml": Internal error occurred: failed calling webhook "default.machinedeployment.cluster.x-k8s.io": Post https://capi-webhook-service.
capi-system.svc:443/mutate-cluster-x-k8s-io-v1alpha4-machinedeployment?timeout=10s: dial tcp 10.96.19.122:443: connect: connection refused
What steps did you take and what happened:
– Created a workload cluster and got
What did you expect to happen: No webhook errors
Anything else you would like to add: This is a clusterctl problem; when fixing namespaces in a component YAML, also
MutatingWebhookConfiguration.webhooks[].clientConfig.service.Namespace
should be fixedHowever, I have a doubt @vincepri @ncdc how conversion webhooks should be configured in a multitenant environment? Can the different
MutatingWebhookConfiguration
configurations, one for each instance of the provider, conflict among themselves?Environment:
/kind bug