karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.12k stars 806 forks source link

RB may not be created when delete and create the same object immediately #2090

Closed Garrybest closed 1 year ago

Garrybest commented 1 year ago

What happened:

  1. Create a deployment and wait it until running.
    # kubectl create -f samples/nginx/
# kubectl get deployments.apps 
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   2/2     2            2           3m4s
  1. Delete this deployment and create it again immediately, we could use kubectl replace
# kubectl replace --force -f samples/nginx/deployment.yaml                         
deployment.apps "nginx" deleted
deployment.apps/nginx replaced
  1. RB will not be created
# kubectl get rb                    
NAME            SCHEDULED   FULLYAPPLIED   AGE

What you expected to happen: Recreate RB.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Procedure:

  1. We delete deployment.
  2. RB tends to be GC but not yet deleted.
  3. We create deployment, controller could just update the former RB.
  4. RB is deleted by GC.
  5. The controller could not create RB because no more events received.

I think we could watch the deletion event for RB and enqueue the object into detector.

/cc @XiShanYongYe-Chang @pigletfly @RainbowMango

Environment:

RainbowMango commented 1 year ago

Interesting.

dddddai commented 1 year ago

How about adding UID to binding name, so that we are able to distinguish bindings for each "object", but it could break users who are relying on the naming rule name-kind https://github.com/karmada-io/karmada/blob/cee1a2f4a16b5292b61007ce3f87e887c1e8acc2/pkg/detector/detector.go#L665-L666

Just an option

BTW I'm thinking name-kind is technically not enough to identify an object, especially for different groups, for example:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway

and

apiVersion: networking.istio.io/v1beta1
kind: Gateway
Garrybest commented 1 year ago

It makes sense. But I'm afraid it's not friendly for uses when the binding name is too long or the name has implicit connection with original object.

Maybe we could hash the uid to make it shorter?

Poor12 commented 1 year ago

I try serveral times. Sometimes RB can update normally, sometimes it do happen the phenomennon. Let's talk about the normal situation. When we delete deployment, RB tend to be GC but not yet deleted basiclly because RB is ownerReferenced by the deployment. We create deployment, controller could just update the former RB. Actually If the updated was GC before,the ownerReference in RB has changed to the deployment which is newly created. Now GC won't treated this updated RB as needing to be removed. So RB is updated normally, and will not be deleted. I don't think we must recreate in this situation, but must guarantee the eventual consistency.

But Sometimes, GC is before RB updated. So now we need to recreate RB. From the log, I guess the process of creating a RB happens but it doesn't success.

E0630 03:48:43.910451       1 controller.go:317] "controller/resourcebinding: Reconciler error" err="Operation cannot be fulfilled on resourcebindings.work.karmada.io \"nginx-deployment\": the object has been modified; please apply your changes to the latest version and try again" reconciler group="work.karmada.io" reconciler kind="ResourceBinding" name="nginx-deployment" namespace="default"
Garrybest commented 1 year ago

Actually If the updated was GC before,the ownerReference in RB has changed to the deployment which is newly created. Now GC won't treated this updated RB as needing to be removed. So RB is updated normally, and will not be deleted. I don't think we must recreate in this situation, but must guarantee the eventual consistency.

I think we should recreate RB instead of updating the ownerReference. If so, the recreation would have no effect. Users want to delete their former deployment and create a new one, imagine we delete the deployment and create a new one, all pods will be recreated. So in terms of this intention, I prefer to delete the obsolete RB and create a new one.

XiShanYongYe-Chang commented 1 year ago

I was wondering why it wasn't created and what was the root case. I haven't figured it out yet.

Garrybest commented 1 year ago
  1. Delete the deployment.
  2. RB is waiting for GC.
  3. Create this deployment again immediately.
  4. Controller watches the creation event and tries to create or update RB, since RB has not been deleted by garbage collector yet, the controller only updates the RB.
  5. RB is deleted by garbage collector.
  6. Now the controller will not receive any other events for the deployment, so RB would not be created until controller is restarted.
Garrybest commented 1 year ago

You could have a try by using kubectl replace --force -f samples/nginx/deployment.yaml after the deployment is running. And if we restart karmada-controller-manager, the RB could be created.

XiShanYongYe-Chang commented 1 year ago

@Garrybest Thanks, I figure it out. I was wondering if we could change the logic for updating resourceBinding to stop updating the old resourceBinding when it has been marked for deletion and return an error.

Garrybest commented 1 year ago

I think it could works. The controller will retry, but we should be aware that the error would call an ItemExponentialFailure requeue.

But I also think https://github.com/karmada-io/karmada/issues/2090#issuecomment-1170660576 is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.

dddddai commented 1 year ago

But I also think #2090 (comment) is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.

Yeah I believe we should change the naming rule, I will do this if you all agree :)

XiShanYongYe-Chang commented 1 year ago

Yeah I believe we should change the naming rule, I will do this if you all agree :)

Sure. #2099 record this problem. We may also need to consider upgrading when looking solutions.

Garrybest commented 1 year ago

Yeah I believe we should change the naming rule, I will do this if you all agree :)

Hi @dddddai, would you like to deal with this issue?

RainbowMango commented 1 year ago

Fix this issue or update the naming rule? As we discussed at the last meeting, if we are going to change the naming rule, we need a compatible solution.

dddddai commented 1 year ago

Hi @dddddai, would you like to deal with this issue?

Yes but it's tricky to handle the compatibility

Fix this issue or update the naming rule? As we discussed at the last meeting, if we are going to change the naming rule, we need a compatible solution.

Sorry I didn't attend the meeting, what's your solution?

I'm thinking we can add a temporary logic for upgrading, here's the general idea

image

wuyingjun-lucky commented 1 year ago

Can we just requeue the key if we can get a resourcebinding that matched the pp label and the deletetimestamp is not zero ?

wuyingjun-lucky commented 1 year ago

Our env encounter this bug(using client-go delete and then create job) and I temporary fix it by requeue the event if exists a rb and the deletetimestamp is not zero. @Garrybest @dddddai @RainbowMango Can we modify like this ?

Garrybest commented 1 year ago

Hi @wuyingjun-lucky, I think we should requeue the event after rb is deleted. So do you requeue the event before deletion? Cause you say if exists a rb and the deletetimestamp is not zero.

wuyingjun-lucky commented 1 year ago

I think it could works. The controller will retry, but we should be aware that the error would call an ItemExponentialFailure requeue.

But I also think #2090 (comment) is another solution to solve all the relevant issues. The main cause is we garble two different objects with the same ObjectReference.

Yes, the controller will retry and delay the updation for rb. I just temporary fix it by this because our env encounter this bug

Garrybest commented 1 year ago

It seems like a temporary solution, not a best one. Because when we requeue the object, it will call an ItemExponentialFailure requeue.

RainbowMango commented 1 year ago

I think we should recreate RB instead of updating the ownerReference. If so, the recreation would have no effect. Users want to delete their former deployment and create a new one, imagine we delete the deployment and create a new one, all pods will be recreated. So in terms of this intention, I prefer to delete the obsolete RB and create a new one.

Why do people delete and create same object immediately? What's behavior he/she expect for? Do you mean he/she expects to recreate the Pod? If the no changes to the specification of the Pod, why expect to recreate it? If the specification changes, no matter if we re-create the binding or update the binding, the Pod would re-create eventually, right?

Garrybest commented 1 year ago

Let me give a example.

  1. A user creates a deployment.
  2. He finds something is wrong, e.g., image tag is wrong.
  3. He does not want to roll update the deployment, so he modify the deployment yaml and recreate it by using kubectl replace -f --force .
  4. Now he has encounted this issue.
RainbowMango commented 1 year ago

Thanks for the clarification, I get it.

diff --git a/pkg/detector/detector.go b/pkg/detector/detector.go                                                          
index be0dc958..7a17e6b7 100644                                                                                           
--- a/pkg/detector/detector.go                                                                                            
+++ b/pkg/detector/detector.go                                                                                            
@@ -451,6 +451,9 @@ func (d *ResourceDetector) ApplyPolicy(object *unstructured.Unstructured, object                      
        var operationResult controllerutil.OperationResult                                                                
        err = retry.RetryOnConflict(retry.DefaultRetry, func() (err error) {                                              
                operationResult, err = controllerutil.CreateOrUpdate(context.TODO(), d.Client, bindingCopy, func() error {
+                       if bindingCopy.DeletionTimestamp != nil {                                                         
+                               return fmt.Errorf("do not update during deleting/gc")
+                       }
                        // Just update necessary fields, especially avoid modifying Spec.Clusters which is scheduling result, if already exists.
                        bindingCopy.Labels = util.DedupeAndMergeLabels(bindingCopy.Labels, binding.Labels)

Is it a workable solution? (A little bit ugly though)

Maybe we could hash the uid to make it shorter?

Actually, I like this approach, but it's pretty hard to keep compatible. will look at the solution @dddddai mentioned on above https://github.com/karmada-io/karmada/issues/2090#issuecomment-1181791827.

Garrybest commented 1 year ago

Right, it could works temporarily like https://github.com/karmada-io/karmada/issues/2090#issuecomment-1193306423 said. We could make a quick fix here and then wait for long-term solution.

RainbowMango commented 1 year ago

+1 @dddddai How do you say? Your idea(as mentioned at https://github.com/karmada-io/karmada/issues/2090#issuecomment-1181791827) is probably the long-term solution.

dddddai commented 1 year ago

I'm ok with the quick fix

So for long-term solution we are going to change the naming rule to name-kind-hash, right? If so I will try implementing my idea and see if it works

RainbowMango commented 1 year ago

So for long-term solution we are going to change the naming rule to name-kind-hash, right?

How about let's have a talk at the meeting next week.

dddddai commented 1 year ago

Sure

jwcesign commented 1 year ago

anything new about this?

dddddai commented 1 year ago

If we are gonna change the naming rule and the idea makes sense, I will work on it. @RainbowMango WDYT