kubernetes-sigs / kubebuilder-declarative-pattern

A toolkit for building declarative operators with kubebuilder
Apache License 2.0
252 stars 84 forks source link

Prune logic for ApplySetApplier #325

Closed yuwenma closed 1 year ago

yuwenma commented 1 year ago

This PR adds the server-side pruning logic for ApplysetApplier.

How to use

    applier := applier.NewApplySetApplier(metav1.PatchOptions{}, metav1.DeleteOptions{}, applier.ApplysetOptions{}) 
    r.Reconciler.Init(mgr, &addonsv1alpha1.YourCR{},
        declarative.WithApplier(applier),
        declarative.WithApplyPrune())

Use Cases

First Time Reconcile

# subject
apiVersion: addons.configdelivery.anthos.io/v1alpha1
kind: MyCR
metadata:
  annotations:
    applyset.kubernetes.io/additional-namespaces: mycr
    applyset.kubernetes.io/contains-group-resources: clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,configmaps,customresourcedefinitions.apiextensions.k8s.io,deployments.apps,namespaces,networkpolicies.networking.k8s.io,rolebindings.rbac.authorization.k8s.io,roles.rbac.authorization.k8s.io,secrets,serviceaccounts,services,statefulsets.apps
    applyset.kubernetes.io/tooling: MyCR/
  labels:
    applyset.kubernetes.io/id: applyset-yHSTf9d_-Ozn5cvv-JpOaK-5-W79ncyZpBVXt1_nZzc-v1
  name: mycr-sample
spec:
  version: 2.5.11

# a deployment in the manifest
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    addons.configdelivery.anthos.io/argocd: mycr-sample
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: mycr-controller
    app.kubernetes.io/part-of: mycr
    applyset.kubernetes.io/part-of: applyset-yHSTf9d_-Ozn5cvv-JpOaK-5-W79ncyZpBVXt1_nZzc-v1

New Manifest GVR

If you upgrade the declarative CR version, which contains new manifests GroupVersionResource, you can expect:

Manifest no longer needed

If you upgrade the declarative CR version, which no longer contain certain manifests, you can expect:

You can find logs like below

I0329 20:53:39.968892   57451 applyset.go:206] prune is enabled
I0329 20:53:40.282866   57451 applyset.go:330] pruned resource Deployment.apps <NS>/<DEPLOYMENT NAME>
I0329 20:53:40.282953   57451 applyset.go:225] prune succeed
k8s-ci-robot commented 1 year ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

yuwenma commented 1 year ago

/assign @justinsb

yuwenma commented 1 year ago

@justinsb Regarding our discussions about RESTMappings, I think it involves two problems (see the third commit) and it's kind of complicated in the code, so I'd like to explain a little bit more:

Problem 1

CRD and its CR objects are deployed in reverse order and it results in missing RESTMapping

Solution

We uses the k-d-p/ControllerRESTMapper to keep track and cache all the RESTMappings in history (ControllerRESTMapper is initialized on the k-d-p/Reconciler level).

During the applyset Apply, If one manifest apply fails, we continue with the next manifest. The failed apply will trigger a re-deploy, so the CR object can be applied successfully. And both CRD and CR RESTMappings are cached in the ControllerRESTMapper.

This guarantees that if the CR object is ordered before CRD, both of them can still be deployed successfully in two Apply runs, and both RESTMappings are cached.

Problem 2

k-d-p/ applyset needs to provide kubectl.ApplySet the RESTMappings info so that it can efficiently find the right objects to prune.

This actually involves two different kinds of RESTMappings

  1. The RESTMappings from last deployed manifests. This RESMappings is used to parse the parent's applyset.kubernetes.io/contains-group-resources annotation to get the previously applied manifests GVRs.
  2. The current manifests RESTMappings to determine the up-to-date GVRs.

Solution

Besides caching all the RESTMappings as an attribute to ApplysetApplier, we initialize the kubectl ApplySet in each Apply run, and we provide both caching-all RESTMapping and current-only RESTMapping separately to the kubectl ApplySet

  1. The kubectl.ApplySet will use the caching-all RESTMapper to parse the parent annotation to get the previously applied GVRs.
  2. k-d-p ApplySet will register the current manifest's RESTMappings after it's successfully applied to the kubectlapply.ApplySet (a different var than the caching-all RESTMappings)
yuwenma commented 1 year ago

@justinsb Besides the RESTMapping, another problem is about the tradeoff between the reliability and efficiency of updating the parent object in the cluster. I'd like to know if the following change makes sense to you:

The previous workflow is

  1. ApplysetApplier caches the current manifests' RESTMappings
  2. ApplyOnce updates the parent (first call, Patch) labels and annotations with the RESTMappings infos (requires both the current and caching-all, since it is a "superset"), this is the BeforeApply method
  3. After apply, updates the parent (second call, Patch) labels and annotations with the RESTMappings info (requires the current-only RESTMappings, this is a "last")
  4. In the main Reconcile, update the parent status (third call, Status.Patch)

Problems

The problem of the previous workflow is that we cannot get the current manifests RESTMappings before apply efficiently, I think you pointed it out here but we cannot return errors either because it will end up the ApplyOnce with no-op.

Trade off

I remove the BeforeApply completely. The downside is that if the operator cashes during manifests apply, the parent may not be able to track the current manifest GVRs, and this can end up as missed pruning objects.

The new workflow is

  1. ApplyOnce applies manifest, and then caches each manifest's RESTMapping for kubectl.ApplySet if the apply succeeds.
  2. Parent is passed to ApplySet as runtime.Object, and the labels and annotations is updated at the end of the pruning.
  3. The Parent is updated in the main Reconcile after apply succeeds. (first call, Patch)
  4. The parent status is updated at last (second call, Status.Patch)
justinsb commented 1 year ago

Thanks - this looks great now, and appears to be additive, so I propose we merge and iterate on anything we find.

/approve /lgtm

k8s-ci-robot commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, yuwenma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/kubebuilder-declarative-pattern/blob/master/OWNERS)~~ [justinsb] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment