karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.4k stars 870 forks source link

In pull mode, after a member cluster's karmada-agent goes down and goes up, the resources be recreated. #5406

Open Patrick0308 opened 1 month ago

Patrick0308 commented 1 month ago

What happened: In pull mode,a member cluster's karmada-agent went down. The cluster's ready status is unknown. The work resource of this member cluster cannot be deleted successfully due to execution-controller finalizer. When karmada-agent up, the karmada-agent will delete and create all resources in member cluster.

What you expected to happen: After a member cluster's karmada-agent goes down and then comes back up, the resources don't be deleted.

How to reproduce it (as minimally and precisely as possible): config.yaml:

apiVersion: v1
data:
  test: 'test'
kind: ConfigMap
metadata:
  name: test
  namespace: default
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: test-config
spec:
  resourceSelectors:
    - apiVersion: v1
      kind: ConfigMap
      name: test
  placement:
    clusterAffinity:
      clusterNames:
        - member3

command:

hack/local-up-karmada.sh
export KUBECONFIG=/Users/patrick/.kube/karmada.config
kubectl config use-context karmada-apiserver
kubectl apply -f config.yaml
export KUBECONFIG=/Users/patrick/.kube/members.config
kubectl config use-context karmada-apiserver
kubectl config use-context member3
kubectl scale deploy karmada-agent --replicas=0 -n karmada-system
## wait 15 minutes
kubectl scale deploy karmada-agent --replicas=1 -n karmada-system
## wait seconds to check config be recreated
kubectl get configmap

Anything else we need to know?:

Environment:

Patrick0308 commented 4 weeks ago

Maybe when member cluster become unknown, we should deleted finalizers karmada.io/execution-controller on work resource?

XiShanYongYe-Chang commented 4 weeks ago

Does a Failover Occur After the Pull Cluster goes down?

Patrick0308 commented 4 weeks ago

@XiShanYongYe-Chang Not only deploy resource is deleted, but also Istio’s VirtualService (VS), DestinationRule (DR), ConfigMap, Service and other resources.

Patrick0308 commented 4 weeks ago

@XiShanYongYe-Chang After deleting resource, agent create it immediately.

XiShanYongYe-Chang commented 4 weeks ago

There should be a cluster failover. All resources on the cluster are deleted. When the cluster recovers, the behavior you describe will occur.

Check whether the failover featuregate is enabled. The failover featuregate is enabled by default in the current version.

Patrick0308 commented 4 weeks ago

Thanks @XiShanYongYe-Chang , I disable failover feature to test. It not recreate resource now.

If I want to enable failover feature, how to avoid recreate resource in this situation? Because some resource can't be recreated such as LoadBalancer Service.

XiShanYongYe-Chang commented 4 weeks ago

Hi @Patrick0308 You may need to re-trigger the scheduling of resources if you want to achieve your goal. Now we have APIs that can proactively trigger the rescheduling of resources.

For the Failover feature, we plan to perform some iterations to improve user experience, ref #5150. Your comments and suggestions are welcome.