kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
295 stars 425 forks source link

[managedcluster] defaulting webhooks #615

Closed alexeldeib closed 4 years ago

alexeldeib commented 4 years ago

/kind feature

Describe the solution you'd like There are probably a fair number of properties we could default earlier in the pipeline. CNI, Load balancer SKU, networking fields, node SKU/default pool. Add a defaulting webhook and begin defaulting some of these fields.

CecileRobertMichon commented 4 years ago

/help

k8s-ci-robot commented 4 years ago

@CecileRobertMichon: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/615): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
muang0 commented 4 years ago

Hi Ace! I'm still in the process of getting a better understanding (of managedcluster, capi & capiz, and webhooks) before beginning managedcluster work

I've got a couple questions to help validate my understanding if you don't mind :smile: Would the flow of creating a managedcluster look like the following?

  1. API request sent to kubernetes management cluster api server to create CRD machine on azure provider
  2. CAPI and CAPZ controllers pick up the changes from the api server
  3. management cluster CAPI and CAPZ controllers reconcile current state to create cluster in azure

And the webhook would default values as part of step 1? As opposed to if the logic were baked into the CAPZ controller?

Thus the webhook would apply defaults in the 'Mutating admission' step from the below screenshot? (taken from guide to admissions controllers) image

Thanks for the guidance! :smile:

muang0 commented 4 years ago

So currently we are defaulting fields for managedCluster in the 'Process Item' step of the custom controller? And thus moving to webhook would default the fields even before the controller picks up the changes from the api server? (snip taken from here, learning about custom controllers) image

alexeldeib commented 4 years ago

And the webhook would default values as part of step 1? As opposed to if the logic were baked into the CAPZ controller?

Correct.

So currently we are defaulting fields for managedCluster in the 'Process Item' step of the custom controller? And thus moving to webhook would default the fields even before the controller picks up the changes from the api server?

Also correct.

There are a few cases here that are poorly modeled even though they are technically optional fields. A good example is networkPlugin. I don't think AKS API requires this field to be set, but the API will reject attempts to update this field. We need to model that in our CRD. On the validation side, we would want to prevent the ability to persist invalid configurations (like changing the networkPlugin on update).

There are a fairly limited number of cases i'd enable defaulting for. We should probably default networkPlugin to Azure (I think having no network policy is valid).

A good case for defaulting webhooks is the service CIDR and the DNS service IP for AKS. You have a good opportunity to remove this hack by taking the service CIDR and defaulting the DNS IP, + adding a validating webhook to enforce the correctness if the user provides both: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/aa979f27a021e4d87e2bf2926344833042f400c3/cloud/services/managedclusters/managedclusters.go#L146-L152

muang0 commented 4 years ago

Thanks for the response! This gives me a great direction to go :smile: I will provide an update once I have made some progress

muang0 commented 4 years ago

Hi Ace, Hope you're having a good weekend! I've made a bit of progress and am hoping to get your thoughts on an approach I'm trying:

I've updated the CRD for ManagedCluster to include DNSServiceIP in the spec. Inside the ManagedCluster webhooks, I will obtain the serviceCidr from the associated CAPI cluster spec (still working through the specifics of how this would look). Once I have access to both specs in the webhooks I can perform the defaulting/validating logic.

Would love to hear your thoughts on including DNSServiceIP in the ManagedCluster CRD and pulling the serviceCidr from the webhooks! :smile:

muang0 commented 4 years ago

Or should we model the serviceCidr under the ManagedCluster Spec instead of pulling from CAPI Cluster Spec? I'm not sure if referencing a separate object in a webhook is possible (Pulling serviceCidr from CAPI Cluster CR in the CAPZ ManagedCluster webhooks)

alexeldeib commented 4 years ago

I think we would probably need DNSServiceIP in AzureManagedControlPlane spec and validate against the CAPI Cluster. We should probably avoid duplicating the service CIDR so we maintain a single source of truth.

still working through the specifics of how this would look

If we do cross-object validation inside webhooks, we probably need something like this: https://github.com/alexeldeib/hook-test/blob/ca37300b9e10548a8cfbf24ba6f782469d92ec6c/webhook.go#L15 https://github.com/alexeldeib/hook-test/blob/ca37300b9e10548a8cfbf24ba6f782469d92ec6c/main.go#L68-L73

@CecileRobertMichon I recall we talked about cross-object validation in webhooks like this, do you think this use case is okay? I don't remember the takeaway.

CecileRobertMichon commented 4 years ago

I think cross-object validation is fine, it's cross-object defaulting that can be dangerous (ref: https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/623)

alexeldeib commented 4 years ago

Got it. So maybe we can’t actually get rid of the IP CIDR hack in the defaulted case, but we can at least handle validation in the webhook if the DNS IP is provided at create time.

We’d also want to validate against changing these fields on updates

CecileRobertMichon commented 4 years ago

If the field in the other object that the default is based on is immutable (and there's validation in place on that object's webhook to make sure it fails update if it is changed), then I think it's okay to default too. I'm more concerned about a case where we default something based on another object and that other object is subsequently modified (ie. the default is no longer accurate).

muang0 commented 4 years ago

@alexeldeib Hi Ace! qq- I am wondering which design approach is preferred for determining which cluster is associated with the amcp object during webhook runtime. I like the first option since it doesn't make any assumptions, but I could be missing something in that assessment. what are your thoughts?

  1. Add clusterRef field to the amcp spec (similar to controlPlaneRef in CAPI ClusterSpec)
  2. If there is only one cluster in the same namespace as amcp take that one, if multiple skip defaulting dnsServiceIp
  3. Infer the cluster name from the amcp name

Thanks for the help! really enjoying this one :smile:

alexeldeib commented 4 years ago

Sorry, missed replying to this --

Maybe the DNS IP should stay inside reconcile. The canonical way to get the corresponding cluster would be the owner reference., but that can't be set until after webhook runtime. We can validate against updates in the validating webhook after setting it during reconcile.

nader-ziada commented 4 years ago

already fixed in PR #862

/close

k8s-ci-robot commented 4 years ago

@nader-ziada: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/615#issuecomment-686568724): >already fixed in PR #862 > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.