karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.12k stars 807 forks source link

[BUG] `generation` and `observedGeneration` of CloneSet in karmada is not aligned. #4866

Open veophi opened 1 week ago

veophi commented 1 week ago

What happened: status.bservedGeneration of federated resource (CloneSet) does not align with its metadata.generation.

What's the meaning of status.bservedGeneration?

image

However, Karmada uses status.bservedGeneration of member resource as its status.bservedGeneration in federated resource, it's incorrect.

What you expected to happen: status.bservedGeneration of federated resource (CloneSet) aligns with its metadata.generation if its all member resources have status.observedGeneration >= metadata.generation.

How to reproduce it (as minimally and precisely as possible): Update any federated Deployment spec and watch its generation&observedGeneration.

Anything else we need to know?:

Environment:

XiShanYongYe-Chang commented 1 week ago

I've tested it and it doesn't seem to reproduce the problem.

image

veophi commented 1 week ago

reproduce

@XiShanYongYe-Chang

Sorry, Deployment generation has another bug. But, this bug is easily reproduced with CloneSet when the generation in member is inconsistent with in karmada.

➜  ~ use_karmada
➜  ~ k get clone -oyaml | grep -i generation
    generation: 2
    observedGeneration: 4
    generation: 5
    observedGeneration: 4
➜  ~
XiShanYongYe-Chang commented 1 week ago

Sorry, Deployment generation is another bug.

If there is another bug, we can track it with a new issue.

But, this bug is easily reproduced with CloneSet when the generation in member is inconsistent with in karmada.

I got it. This is because we did not implement a state collection resource interpreter for the cloneset resource.

https://github.com/karmada-io/karmada/blob/6e5a6029922091e22bdae470323c2483334273ff/pkg/resourceinterpreter/interpreter.go#L62

https://github.com/karmada-io/karmada/blob/6e5a6029922091e22bdae470323c2483334273ff/pkg/resourceinterpreter/interpreter.go#L56

veophi commented 1 week ago

Sorry, Deployment generation is another bug.

If there is another bug, we can track it with a new issue.

But, this bug is easily reproduced with CloneSet when the generation in member is inconsistent with in karmada.

I got it. This is because we did not implement a state collection resource interpreter for the cloneset resource.

https://github.com/karmada-io/karmada/blob/6e5a6029922091e22bdae470323c2483334273ff/pkg/resourceinterpreter/interpreter.go#L62

https://github.com/karmada-io/karmada/blob/6e5a6029922091e22bdae470323c2483334273ff/pkg/resourceinterpreter/interpreter.go#L56

@XiShanYongYe-Chang here's bug https://github.com/karmada-io/karmada/blob/6e5a6029922091e22bdae470323c2483334273ff/pkg/resourceinterpreter/default/thirdparty/resourcecustomizations/apps.kruise.io/v1alpha1/CloneSet/customizations.yaml#L72

XiShanYongYe-Chang commented 1 week ago

Oh... Here it is, sorry I forgot.

Let me invite the owner of the feature to take a look at this issue. Hi @yike21, can you help take a look? /cc @yike21

By the way, do you know how to fix it?

veophi commented 1 week ago

Oh... Here it is, sorry I forgot.

Let me invite the owner of the feature to take a look at this issue. Hi @yike21, can you help take a look? /cc @yike21

By the way, do you know how to fix it?

@XiShanYongYe-Chang yes, I know.

BTW, Deployment generation bug and its fix is here: https://github.com/karmada-io/karmada/pull/4867

XiShanYongYe-Chang commented 1 week ago

BTW, Deployment generation bug and its fix is here: #4867

Thanks a lot~

yike21 commented 1 week ago

Let me invite the owner of the feature to take a look at this issue. Hi @yike21, can you help take a look? /cc @yike21

By the way, do you know how to fix it?

I will do it asap. Thanks a lot! @veophi

XiShanYongYe-Chang commented 1 week ago

/assign @veophi In favor of #4867

veophi commented 1 week ago

//TODO: CloneSet generation logic will be fixed later. @XiShanYongYe-Chang @yike21

XiShanYongYe-Chang commented 1 week ago

Hi @veophi I've had a look at PR #4867, but I don't understand what the problem was with the previous Deployment status observedGeneration. I understand that the previous logic was to set observedGeneration directly to deployment generation when the status was aggregated, as described in the comments. The current behavior may be more rigorous. The waits for the Deployment state in the member cluster to be ready before proceeding with the collection.

veophi commented 1 week ago

Hi @veophi I've had a look at PR #4867, but I don't understand what the problem was with the previous Deployment status observedGeneration. I understand that the previous logic was to set observedGeneration directly to deployment generation when the status was aggregated, as described in the comments. The current behavior may be more rigorous. The waits for the Deployment state in the member cluster to be ready before proceeding with the collection.

Do you have wechat?

veophi commented 1 week ago

@XiShanYongYe-Chang observedGeneration == generation means the Deployment controller has observed the spec(generation) changes and updated the status based on the changes.

If you always set observedGeneration = deploy.Generation and ignore whether the status corresponds to the generation, the status you aggregated may be untrustworthy.

XiShanYongYe-Chang commented 1 week ago

Thanks, I get it.

This is a common problem with workload resources. Can you help list all the resources that need to be handled and then we can complete them one by one. How about creating an umbrella issue?

yike21 commented 1 week ago

This is a common problem with workload resources. Can you help list all the resources that need to be handled and then we can complete them one by one. How about creating an umbrella issue?

thirdparty resource

There are a number of third-party resources that need to be modified to aggregate .status.observedGeneration. They can be modified in the same way as CloneSet.

XiShanYongYe-Chang commented 1 week ago

Thanks~ @yike21

Can you help create an umbrella issue to track the task? Each resource to be processed can be processed as a subtask.

yike21 commented 1 week ago

Can you help create an umbrella issue to track the task? Each resource to be processed can be processed as a subtask.

OK, I will do it soon.

XiShanYongYe-Chang commented 1 week ago

As an evolution of the previous resource template status.observedGeneration solution, I think that we can continue to promote the current issue as a feature.

We can use #4870 to trace subsequent tasks. Thanks again @veophi @yike21

/kind feature