Open RainbowMango opened 2 weeks ago
Looks great, thank you!
Could we add a checklist item to include a default failoverType label onto the resource that has been failed over?
Could we add a checklist item to include a default failoverType label onto the resource that has been failed over?
I don't have a strong feeling that we do need it, because according to the draft design, you can declare the label name to whatever you expects. For instance, you can declare the label name with karmada.io/failover-flink-checkpoint
.
Then, you can configure the Kyverno with that label. Am I right?
@mszacillo I'm trying to split the whole feature into small pieces, hoping more people could get involved and accelerate development.
For now, it's working in progress, but glad you noticed it, let me know if you have any comments or questions.
@RainbowMango I think that's a good idea, and having this feature available faster would be great. :)
Do you have a preference on who will be working on which task? If not I can pick up the introduction of PurgeMode to the GracefulEvictionTask today.
In addition, could we start a slack working group channel? Given the time differences, I think being able to have more rapid conversations on slack would improve the implementation pace.
I don't have a strong feeling that we do need it, because according to the draft design, you can declare the label name to whatever you expects.
That's true, we can simply declare our own label name for the use-case. In the case of a failover, it might be helpful to distinguish between cluster + application failovers, and only Karmada has the context. But perhaps I'm creating a use-case before it's even appeared.
Do you have a preference on who will be working on which task? If not I can pick up the introduction of PurgeMode to the GracefulEvictionTask today.
Sure go for it! Assigned this task to you. I think you are the feature owner, it would be great if you could work on it :) Generally speaking, anyone can take the task without an assignment by leaving a comment here. The issue owner(it's me in this case) will assign it by adding the name to the end of the task.
In the case of a failover, it might be helpful to distinguish between cluster + application failovers, and only Karmada has the context. But perhaps I'm creating a use-case before it's even appeared.
Yeah, the only benefit I can see is that it might help to distinguish failover types, but I think there is no rush to do it until there is a solid use case. I added a checklist item for this; we can revisit it later.
Double confirm if we need to introduce a default label to distinguish the failover type.(Waiting for real-world use case).
Make changes to the RB application failover controller and CRB application failover controller to build eviction task for PurgeMode Immediately. (@mszacillo)
@mszacillo assigned this task to you according to the discussion on https://github.com/karmada-io/karmada/pull/5821#pullrequestreview-2438835388.
Summary Karmada’s scheduling logic runs on the assumption that resources that are scheduled and rescheduled are stateless. In some cases, users may desire to conserve a certain state so that applications can resume from where they left off in the previous cluster.
For CRDs dealing with data-processing (such as Flink or Spark), it can be particularly useful to restart applications from a previous checkpoint. That way applications can seamlessly resume processing data while avoiding double processing.
This feature aims to introduce a generalized way for users to define application state preservation in the context of cluster-to-cluster failovers.
Proposal
Iteration Tasks -- Part-1: Ensure scheduler skips clusters where triggers the failover
Immediately
. (@mszacillo)Graciously
by default as a compromise. ??Iteration Tasks -- Part-2: state preservation and feed
StatePreservation
to PropagationPolicy. (See the API design here)PreservedLabelState
to ResourceBinding. (See the API design here)PreservedLabelState
when triggering eviction.PreservedLabelState
when triggering eviction.PreservedLabelState
to new clusters(failover to).Iteration Tasks -- Part-3: failover history The failover history might be optional as we don't rely on it. TBD: based on #5251