feature: settle debate about the semantics of singleton status return

kubestellar / kubestellar

KubeStellar - a flexible solution for challenges associated with multi-cluster configuration management for edge, multi-cloud, and hybrid cloud

https://kubestellar.io

Apache License 2.0

271 stars 65 forks source link

feature: settle debate about the semantics of singleton status return #1734

Open MikeSpreitzer opened 9 months ago

MikeSpreitzer commented 9 months ago

Feature Description

The semantics now are different than they were in release-0.14. The semantics now require something (you might call it a "scheduler", whatever it is) below the Placement (BindingPolicy) interface to make choices about distribution. This is a camel's nose in the tent that not every one is happy about.

Proposed Solution

Discuss and reach agreement.

Want to contribute?

[ ] I would like to work on this issue.

Additional Context

No response

ezrasilvera commented 8 months ago

Let's use this issue for proposals. The discussion is is on the definition and not on how we can enforce it. My proposal is: Singleton: Singleton is defined as flag in the placement and the interpretation is that each objects included in this placement should be deployed into a single WEC. If there are two different placements with overlapping object(s) that are deployed to different WECs , this violates the singleton pattern. In practice the singleton is controlled on the deployed object level by using a label on the objects (that was generated automatically)

Singleton status Status of an singleton object is returned as a WorkStatus object and then copied into the original object in the WDS.
Current State: if there was a violation of the singleton pattern and such an object was deployed into two WECs , the status addon on each WEC will generate a WorkStatus , resulting in multiple WorkStatuse objects. The controller today copy one of these statuses into the object in WDS. This behavior should be changed. Either enforcing the singleton so there are only a single object instance with such a label. or produce an error and don't update the WDS object.

ezrasilvera commented 8 months ago

cc: @nirrozenbaum @pdettori @MikeSpreitzer

nirrozenbaum commented 8 months ago

I suggest to leave aside the current status cause once we agree on the definition we can do the required work to implement it.

My thinking about this topic is a bit different.

observation - singleton status is a private case of status summarization. explanation: if we had status summarization feature, would it be valid to define distribution to multiple clusters (e.g, cluster1, cluster2, .. clusterN) and define a policy to collect full status only from cluster1? sounds to me like a totally valid scenario. for the long term, we need to consider how would we settle this debate if we had status summarization feature since this is where we're heading. in the long term I think we all agree we want status summarization in KubeStellar.

considering the above, I do not think we should couple distribution with status collection. these are two DIFFERENT and TOTALLY SEPARATED features of KubeStellar.

so my proposal is: objects that are defined for distribution using a BindingPolicy should be distributed to the selected clusters. it should not matter if there are overlapping BindingPolicies and should not matter if status is required or not.

regarding the singleton status itself - I think the current definition is problematic. if user wants status from a single cluster, the user should specify which cluster. so IMO the definition should be something like - WantSingletonStatus: cluster1 (true is not enough, should specify WHICH cluster). how would it work: every such singleton status definition would label the selected object(s) with the required cluster. if there is a match and the required object was distributed to the specified WEC, the status addon reports the status back to the center using WorkStatus. in the center - if there is a match such that singleton status was required AND a single WorkStatus exists (from the correct cluster), it copies the status to the original object in WDS. otherwise, we produce an error in the origin object and mention that singleton status was required while more than one cluster reported the work status.

This proposal leaves room for decoupling status policy definition from BindingPolicy.

pdettori commented 8 months ago

I agree with the principle of separating the distribution aspect of the binding policy from the status collection aspect. We should perhaps consider other parts of the policy for constraining to one cluster if that is a requirement (we discussed before about specifying the max number of clusters that can be selected but decided to leave this discussion for later). I like the idea of allowing to specify a specific cluster (we can discuss on the specifics of how that is specified, but just the cluster name seems like a good start at this point).

MikeSpreitzer commented 8 months ago

I also agree that we should separate the concepts of (a) making distribution choices below the BindingPolicy interface and (b) special summarization for the special case of numDestinations==1.

I think that if we are going to have a scheduler then users will want many more possibilities of what they can ask from the scheduler and we will need an architecture that can deliver that, and possibly cope with different sorts of scheduling requirements. All that is well beyond our near-term needs. I think we should not, at present, have any distribution choices made by the implementation of BindingPolicy.

As for the special case to recognize, I think it is unnecessarily burdensome to require the user to specify the destination in two places. I think it is enough to have a boolean flag on each workload object, where setting the flag to "true" means that IF that object goes to exactly one destination THEN return that object's reported state into that object in its WDS.

That leaves the question of how the user attaches those boolean values to objects. We could specify a label or annotation on the workload object to carry this value. That would be simplest and easy to implement. However, it is not good in use cases where the workload objects come from third parties --- except when those workload objects come in through clients (ArgoCD? Helm?) that can be told to add a label or annotation to every object they are conveying.

Another simple interface for how to attach these boolean values is to let the BindingPolicy state it, with an answer to the question about overlapping BindingPolicies. I think "OR" is a simple answer. I think that "AND" is also a simple answer, but I expect it will be a less useful answer in general.

nirrozenbaum commented 8 months ago

since singleton status touches the concept of status summarization and is a private case of it, I suggest to start with the simplest approach. I agree with @MikeSpreitzer suggestion that instead of putting WantSingletonStatus in BindingPolicy, we should add label on the workload object in WDS to specify that we want the singleton status for that object.

as Mike wrote, setting the flag to "true" means that IF that object goes to exactly one destination THEN return that object's reported state into that object in its WDS. but it has additional meaning - it also means that ALL clusters that get this object will report their status in WorkStatus. so if there are two receiving clusters, we have two WorkStatus objects, with no status synced into the origin workload object in WDS (which I think is absolutely ok at this stage, we need to handle this in status summarization design).

that would completely decouple status from bindingpolicy and is the simplest approach out of our current options. if this answers the requirement of MCAD (which I think it does) - I vote for this option as a starting point.

once we get to status summarization design, we need to make sure we address this point that way or another.

MikeSpreitzer commented 8 months ago

@nirrozenbaum : BTW, currently EVERY object gets its status returned to a WorkStatus object (if the ocm-status-addon is installed). This is an implementation detail, not relevant to the definition of BindingPolicy. The BindingPolicy should not be trying to say anything about intermediate objects (e.g., ManifestWork or WorkStatus); those are all implementation details of the transport.

ezrasilvera commented 8 months ago

I want to clarify my statement above. As I sated also in the previous KS 0.14 , I believe that singleton is a private case of summarization . Back then I called this "copy mode" of summarization (i.e., we copy the status as is without any operation on the status) . This shouldn't be limited to "single WEC object" - we can have a vector of N full statuses in case users want "copy mode" and the object is deployed to multiple WECs. We can also have the first element in that vector copied into the object status so we effectively get the requirement the MCAD people wanted. Note, that summarization need be done on the WEC in order to reduce traffic from the WEC.

The main problem is that the MCAD integration require us to have support NOW, before we have even a design for summarization. Due to this requirement we need to have support for pushing status into the the object on the current releases and we don't have the option to wait. I suggest not to try and optimize a solution now as we want to change most of it through the summarization. We shouldn't introduce new user UX as we will need to continue supporting this even when we will switch to different mechanism. So for the coming releases, till we have the summarization design we don't try to enforce the pattern or warn the user. The user is instructed not to define overlapping policies . We can report that this error happened after it happened - It's very easy to add an error message to the the object-status as soon as the status controller detects 2 WorStatus objects for the same object.

MikeSpreitzer commented 3 months ago

A couple of updates.

The current API does not have a way for a user to mark a workload object with a bit indicating that singleton status return is desired. The only way to request that in the current API is in BindingPolicy objects.

The Binding objects should be the resolutions of the corresponding BindingPolicy objects and not lose modulation information from them. Thus, the Binding objects should also carry the singleton status return request bit.

My understanding/recollection is that discussions have been converging on the following points.

Requesting singleton status return should not be taken to mean that KubeStellar has scheduler-like responsibilities for choosing 1 among several matching WECs.
Requesting singleton status return ONLY means: status is returned to the workload object in the WDS if and only if all the BindingPolicy (although I would like to change this to Binding, as stated above) objects together collectively associate exactly one WEC with the workload object.