karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.48k stars 885 forks source link

Propagation after failover #5206

Closed oussexist closed 3 months ago

oussexist commented 3 months ago

Hello there , I am using karmada for the federation of 2 clusters ( local and cloud ones) and am using the local cluster as a control plane and member at the same time , so whenever i want to use the cluster as karmada control plane all i need to do is specifying the api server config file with kubeconfig flag. Anyways my issue is that , i deployed 2 nginx deployments with propagation policy and propagate 2 replicas one deployment for each cluster just fine , so since the failover will be configured by default when using Propagation policy , i stopped the cloud cluster and waited for like 8 mins since it's stopped till the deployment of the cloud replica to appear ready in the local cluster (which is kind of too much idk if it's related to the Toleration Seconds: 300 then I would put it to 0 if possible xD ) , so that's fine , the fail over works fine but when starting back the cloud cluster the replica doesn't propagate back even tho i waited for a long time and nothing ...

So this is the propagation Policy :

ubuntu@master:~$ kubectl --kubeconfig /etc/karmada/karmada-apiserver.config describe propagationpolicy
Name:         my-nginx-propagation
Namespace:    default
Labels:       propagationpolicy.karmada.io/permanent-id=3bd40d7a-cee4-4437-9366-d08bf9e09734
Annotations:  <none>
API Version:  policy.karmada.io/v1alpha1
Kind:         PropagationPolicy
Metadata:
  Creation Timestamp:  2024-07-11T15:57:40Z
  Finalizers:
    karmada.io/propagation-policy-controller
  Generation:        2
  Resource Version:  81258
  UID:               6f64a953-7418-4529-b33c-9c82c1595cea
Spec:
  Conflict Resolution:  Abort
  Placement:
    Cluster Affinity:
      Cluster Names:
        member1
        k8s
    Cluster Tolerations:
      Effect:              NoExecute
      Key:                 cluster.karmada.io/not-ready
      Operator:            Exists
      Toleration Seconds:  300
      Effect:              NoExecute
      Key:                 cluster.karmada.io/unreachable
      Operator:            Exists
      Toleration Seconds:  300
    Replica Scheduling:
      Replica Division Preference:  Weighted
      Replica Scheduling Type:      Divided
      Weight Preference:
        Static Weight List:
          Target Cluster:
            Cluster Names:
              member1
          Weight:  1
          Target Cluster:
            Cluster Names:
              k8s
          Weight:  1
  Preemption:      Never
  Priority:        0
  Propagate Deps:  true
  Resource Selectors:
    API Version:   apps/v1
    Kind:          Deployment
    Name:          my-nginx
    Namespace:     default
  Scheduler Name:  default-scheduler
Events:            <none>

and here yo uare the events logs of karmada deployments :

Events:
  Type    Reason                           Age                 From                                           Message
  ----    ------                           ----                ----                                           -------
  Normal  SyncWorkSucceed                  53m (x7 over 65m)   binding-controller                             Sync work of resourceBinding(default/my-nginx-deployment) successful.
  Normal  GetDependenciesSucceed           53m (x7 over 65m)   dependencies-distributor                       Get dependencies([]) succeed.
  Normal  SyncSucceed                      53m (x2 over 65m)   execution-controller                           Successfully applied resource(default/my-nginx) to cluster member1
  Normal  ScheduleBindingSucceed           53m                 default-scheduler                              Binding has been scheduled successfully. Result: {member1:2}
  Normal  EvictWorkloadFromClusterSucceed  53m                 resource-binding-graceful-eviction-controller  Evict from cluster k8s succeed.
  Normal  AggregateStatusSucceed           44m (x14 over 65m)  resource-binding-status-controller             Update resourceBinding(default/my-nginx-deployment) with AggregatedStatus successfully.

Edited : when manually using scale commands on karmada the replicas propagates back just fine :

kubectl --kubeconfig /etc/karmada/karmada-apiserver.config scale deployment my-nginx --replicas=0
sleep 30
kubectl --kubeconfig /etc/karmada/karmada-apiserver.config scale deployment my-nginx --replicas=2

Also I have an important question : So if the control plane cluster goes down we don't talk about HA anymore is there anyway like if one cluster goes down the other becomes control plane !

chaosi-zju commented 3 months ago

the fail over works fine but when starting back the cloud cluster the replica doesn't propagate back even tho i waited for a long time and nothing ...

Yes, it will not propagate back

you can refer to following doc for resolution:

chaosi-zju commented 3 months ago

So if the control plane cluster goes down we don't talk about HA anymore is there anyway like if one cluster goes down the other becomes control plane !

I'm not quite sure.

Maybe this is the architecture you interested in https://github.com/karmada-io/karmada/issues/5103

oussexist commented 3 months ago

as the documentation says :

As a cluster administrator, I hope the replicas redistribute to two clusters when member1 cluster recovered, so that the resources of the member1 cluster will be re-utilized, also for the sake of high availability.

I am wondering why it's not set by default because it's the same thing as the failover , when using propagation policy the failover will be added by default i think it should also add Rebalance or not ? Also Should i apply that WorkLoadRebalance.yaml file everytime i want the recover of the deployments to cluster that failed ? because as i tried it works for the first time but when i put down the cloud cluster to test it again , it dosent get rebalanced automatically (since i already applied the workload rebalance ) and also even if i apply again the file it'll just say unchanged so nothing will happen. I think i need some help now !

and about the high availability yeah it's kind of what i want to try , but yeah i think due to ressources limitations i won't be able to assure HA on cluster-level.

oussexist commented 3 months ago

@chaosi-zju Also i have a little question since am new to karmada i would be thankful if u help me on it , i wonder if i want to expose the deployment from karmada control plane with type loadbalancer for example will it be able to manage between the deployments ? so as a service user won't have problem if one of the clusters go down , cuz i used to use for example metallb as a loadbalancer and whenver i want to acces to the service i just need to acces to it's ip .. , i thought i could do the same thing but using karmada cluster and i think it couldn't be done since there is no crds for that or something like that ! as i searched in documentation i found this : https://karmada.io/docs/userguide/service/multi-cluster-service/ and this https://karmada.io/docs/next/tutorials/access-service-across-clusters/ , but am not sure if thats what am looking for .. because those actually assure that a member can acces to another member's service but now am talking of exposing the service to user for example nginx deployment when i access to it's service i get Welcome to nginx idk if karmada ensures the link between the 2 services in the member clusters as a control plane Regards