karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.12k stars 807 forks source link

Ability to suspend work #4688

Open a7i opened 1 month ago

a7i commented 1 month ago

What would you like to be added: Ability to suspend work to ensure that changes are not being reconciled.

Why is this needed:

User Story 1: As a cluster admin, we may get stuck in a reconciliation loop where karmada controller-manager will update a resource but due to some unknown reason (perhaps a controller on the member cluster) the resource is reverted. The operator can decide to suspend work while debugging the issue.

User Story 2: We're using Karmada to migrate workload from one blue cluster to a green cluster. Once we move a workload, we want to suspend any updates from blue to green until we cut over to green.

Persona

Cluster Admin who is oncall and has permission to modify karmada resources

Implementation Details

There are three ways that we can go about this:

1. Expose suspend on Work CRD

This is the simplest approach but requires the Cluster Admin to identify Work in the karmada-es-${cluster} namespace before patching them. Once the field is set, the controller will get out early. In the code here https://github.com/karmada-io/karmada/blob/master/pkg/controllers/status/work_status_controller.go#L93-L96, we would add the following:

    if work.Susped {
        return controllerruntime.Result{}, nil
    }

Effort: Low

2. Expose suspend on ResourceBinding CRD

This requires the suspend field on ResourceBinding which will also get updated to Work. The Cluster Admin will have to identify the ResourceBinding in the workload namespace before patching them. The name of resource binding is more predictable than Work.

Effort: Medium

3. Expose suspend on PropagationPolicy and ClusterPropagationPolicy CRD

The Cluster Admin will have to know the PP or CPP with the highest priority and decide to suspend them. Changes tosuspend will have to get updated to ResourceBinding and Work.

Effort: X-Large

a7i commented 1 month ago

Created a PR to get feedback: https://github.com/karmada-io/karmada/pull/4689

chaunceyjiang commented 1 month ago

Related to it https://github.com/karmada-io/karmada/pull/1859

RainbowMango commented 1 month ago

User Story 1: As a cluster admin, we may get stuck in a reconciliation loop where karmada controller-manager will update a resource but due to some unknown reason (perhaps a controller on the member cluster) the resource is reverted. The operator can decide to suspend work while debugging the issue.

+1 on this user story. This feature would be helpful for debugging.

User Story 2: We're using Karmada to migrate workload from one blue cluster to a green cluster. Once we move a workload, we want to suspend any updates from blue to green until we cut over to green.

May I ask for more detailed info about how you do the migration? I don't understand how can the workload be synced from blue to green, I understand both the blue and green clusters are taking karmada-apiserver as the source.

Just a guess, the process of migrating a workload would be something like:

Step 1: Create a PropagationPolicy to take over an application from blue cluster, like

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: foo
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - blue

Step 2: Add green cluster to the same PropagationPolicy, so that the application can be synced:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: foo
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - blue
        - green

Step 3: Testing against the green cluster.

Step 4: Remove the application from the blue cluster by removing blue from the PropagationPolicy:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: foo
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - green
RainbowMango commented 1 month ago
  1. Expose suspend on Work CRD This is the simplest approach but requires the Cluster Admin to identify Work in the karmada-es-${cluster} namespace before patching them. Once the field is set, the controller will get out early. In the code here https://github.com/karmada-io/karmada/blob/master/pkg/controllers/status/work_status_controller.go#L93-L96, we would add the following:

I totally agree that Work deserves to have a field to present if the propagation should be suspended.

Actually we already have an annotation(propagation.karmada.io/instruction:suppressed) for the same purpose: https://github.com/karmada-io/karmada/blob/9ccc8be46c135005289367b8867f4b2c82ca44c0/pkg/util/constants.go#L41-L47 But this annotation is used internally in karmada itself in scenario of a Work is not managed by a ResourceBinding.

But, in your case, even after Work introduced the .spec.suspend, your modification maybe overrides by ResourceBinding controller, see https://github.com/karmada-io/karmada/blob/master/pkg/util/helper/work.go#L81.

But don't worry, there are also two approaches to eliminate the risk:

  1. Expose suspend on ResourceBinding CRD
  2. Mutate each field when dealing with the conflict. See [what we have done at the detector](Expose suspend on ResourceBinding CRD).

I tend to introduce the functionality to ResourceBinding or PropagationPolicy, looking forward to hearing your use case.

XiShanYongYe-Chang commented 2 weeks ago

Hi @a7i, we are going to push this feature recently. Can you help to provide some feedback with @RainbowMango above?

a7i commented 2 weeks ago

May I ask for more detailed info about how you do the migration? I don't understand how can the workload be synced from blue to green, I understand both the blue and green clusters are taking karmada-apiserver as the source.

While we have long-term plans to use karmada apiserver for handling our multi-cluster / blue-green uses-cases, we currently reuse our kube-apiserver (on the blue cluster). We essentially bootstrap karmada controller-manager, webhook, scheduler, aggregated-api onto an existing cluster (which reuses the kube-apiserver). We then use policies to migrate workloads gradually to another cluster. I realize that majority of karmada use-cases do not follow this pattern.

Actually we already have an annotation(propagation.karmada.io/instruction:suppressed) for the same purpose:

Ideally we deprecate this annotation in favor of using the same .spec.Suspend or consider using this annotation for consistency.

Mutate each field when dealing with the conflict. See [what we have done at the detector](Expose suspend on ResourceBinding CRD). I tend to introduce the functionality to ResourceBinding or PropagationPolicy, looking forward to hearing your use case.

I agree with your recommendation. Let me pick this back up and have a PR ready this week.

a7i commented 2 weeks ago

@XiShanYongYe-Chang I have updated the PR to reflect the requested changes. I look forward to your feedback

XiShanYongYe-Chang commented 2 weeks ago

We then use policies to migrate workloads gradually to another cluster.

Can you elaborate on this step a little bit?

a7i commented 2 weeks ago

Can you elaborate on this step a little bit?

Given that we have two clusters:

  1. cluster-blue: this is where the original workloads are (and karmada controller-manager also runs here in this setup)
  2. cluster-green: this is the target cluster we're moving workloads to

Let's take a simple workload (Deployment with 3 replicas and Service) as an example.

  1. Create a MultiClusterService
  2. Create an OverridePolicy with replicas as 0 matching cluster-green
  3. Create a PropagationPolicy (with cluster-green in the placement.clusterAffinity.clusterNames[0]) -- observe that it created a Deployment in cluster-green with zero replicas -- observe that the original Deployment in cluster-blue is still 3 replicas
  4. Update OverridePolicy plaintext to /spec/replicas 1 -- we observe that cluster-blue still has a deployment with 3 replicas (no change) -- we observe that cluster-green now has a deployment with 1 replica
  5. Decrement replicas in cluster-blue to 2 -- we observe that cluster-blue now has a deployment with 2 replicas -- we observe that cluster-green still has a deployment with 1 replica (no change)
  6. Update OverridePolicy to /spec/replicas 2 -- we observe that cluster-blue still has a deployment with 2 replicas (no change) -- we observe that cluster-green now has a deployment with 2 replicas
  7. Decrement replicas in cluster-blue to 1 -- we observe that cluster-blue now has a deployment with 1 replica -- we observe that cluster-green still has a deployment with 2 replicas (no change)
  8. Update OverridePolicy to /spec/replicas 3 -- we observe that cluster-blue still has a deployment with 1 replica (no change) -- we observe that cluster-green now has a deployment with 3 replicas
  9. Decrement replicas in cluster-blue to 0 -- we observe that cluster-blue now has a deployment with 0 replica -- we observe that cluster-green still has a deployment with 3 replicas (no change)
XiShanYongYe-Chang commented 2 weeks ago

Thanks for the detailed explanation.

Do you need to pause work at each step interval between 3 and 9 steps? If so, what condition is the pause work waiting for?

a7i commented 2 weeks ago

No, given that we still rely on karmada-controller-manager for handling propagation syncs. After step 9, however, it needs to be suspended.

I think this feature request is generic enough to not have to make our use-case the only use-case. I believe that "User Story 1" from the Issue description is a valid use-case and as a cluster-admin, I want to have this capability at my disposal.

XiShanYongYe-Chang commented 1 week ago

Thanks @a7i.

I wonder if Karmada can do more to facilitate users to use out-of-the-box capabilities, such as canary release, in addition to providing APIs for suspending work propagation (which can be considered as an atomic capability).

I may be wrong, please correct me if so.

XiShanYongYe-Chang commented 1 week ago

Hi @a7i I would like to know, do you use argo-cd for canary release? I was wondering if we could do some combination with argo-cd by suspending work.