kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.45k stars 1.27k forks source link

Adding Patch Option to the patch helper to ignore patching the status #10786

Open mcbenjemaa opened 2 weeks ago

mcbenjemaa commented 2 weeks ago

What would you like to be added (User Story)?

As a developer, I want to have the option to ignore patching the status when patching CAPI resources.

Detailed Description

In certain cases, we would like to patch the resources without patching the status.

For example, In the deletion reconciliation loop, we have to make sure all cloud resources gets deleted, and then we delete the finalizer. But If we patch the resource with a diff in the status and in the metadata (finalizer). The metadata will be patched first, so the garbage collector will clean up the resource.

Therefore, the patch helper will return an error, resource not found when trying to patch the status

https://github.com/kubernetes-sigs/cluster-api/blob/54c8e46eeb4861746d6fadd9847bdea913797231/util/patch/patch.go#L148-L150

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: FooMachine
metadata:
  finalizers:
  - foomachine.infrastructure.cluster.x-k8s.io

Anything else you would like to add?

It would be great to have a Patch Option to tell the helper to ignore the status sub-resource.

 patchHelper.Patch(ctx, fooMachine, patch.WithIgnoreStatus())

Label(s) to be applied

/kind feature /area util

k8s-ci-robot commented 2 weeks ago

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
troy0820 commented 2 weeks ago

Would not using the patch helper but using the reconciler Patch method be an adequate solution? The reconciler makes the patch helper with the cache client it has so the patch helper can do both.

I am trying to understand the functionality you wish to achieve where removing the status patch from the helper (not really removing but an option to not patch the status) is different from using the reconciler Patch method.

mcbenjemaa commented 2 weeks ago

Its obvious that I need a way of ignoring the patches of the status, for some reasons.

In CAPI providers, we follow the structure and mechanics of CAPI, and utils are very helpful.

If we are going to use the controller runtime patch, by server apply or merge, That means we will need to change the logic of the scope we have.

So we better prefer to follow guidelines of CAPI.

troy0820 commented 2 weeks ago

Fair, and just trying to understand the scope of what you wish to accomplish because of what already exist.

The PR linked may get us closer if the triage/accepted is put on this issue. As this is optional, I don't see this being a problem with the implementation.

vincepri commented 2 weeks ago

Reading the use case a bit more sounds like that the finalizer is removed as well as patching the status. I'm wondering if this is more a code error on the user side rather than the patch helper.

The patch helper only patches status if there is a status field, and if there is a diff in status. Ideally, if it's important for the patch to go through without removing a finalizer, the finalizer should be removed only after all conditions are met.

@mboersma looking at the example:

For example, In the deletion reconciliation loop, we have to make sure all cloud resources gets deleted, and then we delete the finalizer. But If we patch the resource with a diff in the status and in the metadata (finalizer). The metadata will be patched first, so the garbage collector will clean up the resource. Therefore, the patch helper will return an error, resource not found when trying to patch the status

In this case, we could maybe reorder the patching to patch status before the spec/metadata fields; although would like to see what @sbueringer thinks about this. In general the patches are eventually consistent, so it'd be beneficial to remove the finalizer once controllers are sure all the operations are done.

Said another way too, why is the status changing if we know we're done with the resource?

sbueringer commented 2 weeks ago

My understanding of the use case is the following:

Based on this, it's entirely realistic that one reconcile updates the status and removes the finalizer.

In my opinion the solution is not to add an option to the patch helper to not update the status. How would you know when you create the patch helper if the entire reconcileDelete goes through so that you would remove the finalizer at the end? Also I think even when the finalizer is removed the status should still be updated correctly, just because the current controller is removing its finalizer it doesn't mean that there are no other finalizers on that object. In that case I would expect the status to be up-to-date.

I would recommend the following instead:

mcbenjemaa commented 2 weeks ago

The problem we have is within our Cloud Provider APIs, as you said, in reconcileDelete, we are deleting resources and, in the end, removing the finalizer. In my opinion, we only want to patch metadata and spec. Because the resource is in a deletion state. Our reconciliation loop will continually update the status, and don't fail if there's no status. That's why I wanted to have this Option.

now, we are solving this by patching the resource prior to removing the finalizer, but I guess that's only a workaround and is not the design of CAPI provider logic

https://github.com/ionos-cloud/cluster-api-provider-ionoscloud/pull/163/files#diff-7fc4e199578bdc7049586527678423ce02aecfc04dc4bd75e8db782b90d011a2R291

mcbenjemaa commented 2 weeks ago

It would also be very helpful, if we add a condition when patching the status and avoid returning ErrNotFound but I guess that requires adding an option as well.

troy0820 commented 2 weeks ago

Not advocating for either solution, but would the option provide an opportunity to utilize the utils with the understanding of the flow of the patch? I've seen controllers that use the patch helper get bit by the patch helper adding and removing the annotation thus creating a hot loop and not finish reconciling the object. (That's more metadata/spec than anything else) Since the adoption of the patch helper and the defer patch, some controllers logic being heavily dependent on this, make assumptions around what should take place.

troy0820 commented 2 weeks ago

Not advocating for either solution, but would the option provide an opportunity to utilize the utils with the understanding of the flow of the patch? I've seen controllers that use the patch helper get bit by the patch helper adding and removing the annotation thus creating a hot loop and not finish reconciling the object. (That's more metadata/spec than anything else) Since the adoption of the patch helper and the defer patch, some controllers logic being heavily dependent on this, make assumptions around what should take place.

While making this PR I was going to ask would it make sense to provide the option to not patch the spec/metadata? However, I didn't want to add that to address an issue that wasn't discussed here.

fabriziopandini commented 2 weeks ago

I think I kind of agree with @sbueringer proposed solution:

  • still try the patch status calls
  • if there is a deletionTimestamp & no finalizers on the object => ignore NotFound errors
    • I think we should check deletionTimestamp & finalizers on the object that we get back from patching the spec & metadata

This will avoid exposing patch helper internals, and make it tolerant to the use case described in the issue (object gone right after patching spec & metadata, before patching status)

troy0820 commented 2 weeks ago

I would recommend the following instead: still try the patch status calls if there is a deletionTimestamp & no finalizers on the object => ignore NotFound errors I think we should check deletionTimestamp & finalizers on the object that we get back from patching the spec & metadata

If this is the recommended solution, I can change my PR with this adjustment within the patch helper method.

fabriziopandini commented 5 days ago

@mcbenjemaa is that a good way forward for you?