kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.45k stars 1.27k forks source link

ClusterResourceSet reconciler should use server side apply to apply resources on workload clusters #10734

Open govinda-attal opened 1 month ago

govinda-attal commented 1 month ago

What steps did you take and what happened?

We use capi to provision workload clusters with minimal resources. Once the clusters are ready we use different approach/pipeline to manage other applications on them.

We are using cluster resource sets to create minimal resources on workload clusters - few namespaces with limited resources (such as config-maps and secrets).

On first pass during early stages of building workload clusters this works as expected and cluster resource sets are reconciled.

Once workload clusters are ready, other pipelines trigger and continuously maintain state of the workload cluster. In these pipelines we install controllers such as hierarchical-namespaces; which applies certain labels <namespace-name>.tree.hnc.x-k8s.io/depth=0 on namespace and govern other resources on workload clusters as per policies (captured using hnc-validating-webhook-configuration, etc).

With hnc in place validating webhook will validate operations being applied on resources.

On next run when capi tries to reconcile resource cluster set by applying resources as per definition it holds (which doesn't have the labels specific to hnc).

Here hnc's validatingwebhookconfiguration kick in and it rejects this apply operation for capi.

This results in ClusterResourceSet reconciliation failure with status as

message: 'patching object /v1, Kind=Namespace <namespace-name>: admission
        webhook "namespaces.hnc.x-k8s.io" denied the request: namespaces "<namespace-name>"
        is forbidden: cannot remove tree label "<namespace-name>.tree.hnc.x-k8s.io/depth"
        in namespace "<namespace-name>"; these can only be managed by HNC'

Referring to apply functions relevant to cluster resource set

It seems it is doing basic patch here. It will be helpful to have server side apply.

server-side-apply/#merge-strategy

Server-Side Apply tries to merge fields based on the actor who manages them instead of overruling based on values. This way multiple actors can update the same object without causing unexpected interference.

What did you expect to happen?

capi cluster-resource-set controller and other controllers such as hnc are able to live in perfect harmony.

Cluster API version

main branch

Kubernetes version

1.28

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
sbueringer commented 4 weeks ago

Hm not sure what is going on here. We use the regular patch in a lot of our controllers, and if I remember correctly they never try to unset labels/annotations set by other users/controllers.

(so wondering what we do differently here)

fabriziopandini commented 5 days ago

looping in @g-gaston who implemented the reconcile mode for CRS.

Personally, I stick to the idea that implementing a sophisticated addon management solution is out of scope of Cluster API. CRS was always intented as simple, stop gap solution, and IMO the way forward is not to make CRS more complex but to bridge towards what we describe in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220712-cluster-api-addon-orchestration.md

g-gaston commented 18 hours ago

For context: when we first implemented this we decided to not use server side apply given we still supported clusters < 1.22. Now that the min version is 1.24, we could revisit this decision.

That said, I'm with @fabriziopandini on this. Given the complexity of the usecase (capi manages creation and update of resources but with other entities partially managing those resources as well), I'm not sure if CSR is the best tool for the job. It was never designed to deal with the complexity of multiple owners and specially not with dynamic admission controllers.

We already have an approved addon proposal (which we didn't have when we implemented updated for CSRs) and if I'm not mistaken we also have a first implementation using helm. I would suggest looking into that as an alternative solution.

Another suggestion, if you already have "secondary" pipelines that maintain the state of the resources in the workload cluster post-creation, why don't you set the CSR in the "create" only mode and let your pipelines manage any future updates of those resources?