VirtualCluster should support to sync specific conditions of PodStatus from tenant to super

FillZpp commented 2 years ago

User Story

As a developer, I would like VirtualCluster to support to sync specific conditions of PodStatus from tenant to super, for some user-defined condition types are designed to be added by users or controllers from tenant cluster.

For example, OpenKruise provides some workloads that support in-place update for Pods, so we define a InPlaceUpdateReady type readinessGate and add the condition of it into Pod status. However, now syncer will only sync Pod status from super to tenant, which will overwrite the custom condition and lead to Pod always be NotReady because Kubelet can not see this condition in super cluster.

Detailed Description

VirtualCluster syncer should support to sync specific conditions of PodStatus from tenant to super.

/kind feature

christopherhein commented 2 years ago

Interesting idea, I'm trying to think of how we could accomplish something, it leads us a little into a more split-brain super & tenant then have context that matters vs making spec tenant authoritative for the .spec and super for the .status. How does this function under the hood when an update actually happens? Would the syncer need to push patches/updates to the pod specs for that in-place upgrade to be fulfilled? I don't believe we support that as of today at all.

Fei-Guo commented 2 years ago

Thanks for filing the issue, I understand the request and I think we should address it. One problem is that both tenant control plane and super cluster may update the conditions specified in the readiness gate. The bigger problem is that the status now has two sources of truth in two etcds, which makes it difficult to avoid overwriting under races.

Fei-Guo commented 2 years ago

@FillZpp, a workaround is to add a Pod label by Kruise controller when the readiness gate is ready, to indicate the syncer to set the condition in the super cluster, which I admit is definitely a hack. I cannot think of a more elegant solution at the moment.

FillZpp commented 2 years ago

Thanks for all your replying.

Would the syncer need to push patches/updates to the pod specs for that in-place upgrade to be fulfilled? I don't believe we support that as of today at all.

Alright, forget about the in-place upgrade, which is just a case we noticed. Actually this is a common request, for Kubernetes provides readinessGate feature to let users control whether a Pod should be ready or not.

kind: Pod
...
spec:
  readinessGates:
    - conditionType: "www.example.com/feature-1"
status:
  conditions:
    - type: Ready                              # a built in PodCondition
      status: "False"
      lastProbeTime: null
      lastTransitionTime: 2018-01-01T00:00:00Z
    - type: "www.example.com/feature-1"        # an extra PodCondition
      status: "False"
      lastProbeTime: null
      lastTransitionTime: 2018-01-01T00:00:00Z
  containerStatuses:
    - containerID: docker://abcd...
      ready: true
...

When we set a custom conditionType in spec.readinessGates, kubelet could set Pod as ready only if the custom condition with True has been added into status.conditions.

However, if syncer can only push the whole status from super to tenant, the custom condition which user adds in tenant cluster will never be synced to super, which means the Pod will never be ready.

Since this is a basic feature provided by Kubernetes, it will be nice if VirtualCluster could support it.

a workaround is to add a Pod label by Kruise controller when the readiness gate is ready, to indicate the syncer to set the condition in the super cluster, which I admit is definitely a hack. I cannot think of a more elegant solution at the moment.

Yeah, it is a workaround that we can deploy two controllers in both super and tenant clusters. The one in tenant adds a specific label into Pod, then the other one in super watches the label and adds condition into status.

I don't know much about the implementation of VirtualCluster, so maybe this is impossible... I'm wondering if we can have an optional white list to let user choose which condition types that should be synced from tenant to super?

wondywang commented 2 years ago

Hi, all. Maybe we can use some tricks like protectedMetaPrefixes to setup a list to determine which status condition should be synced from tenant cluster to super cluster? Of course, can we consider the situation more simpler, and only consider the status conditions that occurs on tenant cluster, which need to be synced to super cluster?

Fei-Guo commented 2 years ago

@FillZpp You don't need another controller in the super cluster because syncer will be the controller that performs reconciliation. This is a common request to support native k8s readiness gate, hence I think the solution should be provided by the syncer. Also, the syncer already performs special logic to mutate metadata (mainly for Pod), other than simply copying metadata. We only need a mechanism to inform the syncer to add particular conditions. I suggest to start will a special label/annotation. Of course, this requires code changes for controllers that use readiness gates like Kruise in order to integrate with VC. Hope this is acceptable.

@wondywang I don't recommend to change VC CRD to support this feature like what we did for label/annotations using prefixes. The reasons are: 1) VC CRD specifies cluster-wise properties or policies. However, the readiness gate is a per-workload specification. There potentially can have conflicts if multiple workloads that have readiness gates are configured in the same VC. For example, both controller A & B use condition X for readiness. One day controller A still uses the condition but decides not to sync the condition X to the super cluster for some reason and it cannot request to change the VC CR because controller B still needs the condition. It is problematic to specify per workload specification in a CRD that represents a whole cluster.

2) The workflow of supporting new workload in VC is getting complicated. For example, VC users need to inform the cluster administrator to mutate the VC spec in order to make their controllers work.

Due to the above, I think a special label/annotation can be a reasonable solution for this feature considering the flexibility and the complexity.

ThunderYe commented 2 years ago

@Fei-Guo @FillZpp I agree there should be a mechanism to config the down/up syncing behave，but can we let a default configuration to support a quite common scenario ? The most common one is :The kurise is only installed in Tenat Cluster, let it work as it works as in Super Cluster, while Kurise needn't to any change anying.
Other scenario, for example some controller will work in Super ,even both Tenant and Super,we can support them by configing some special Labe to guide the complex behavior.

zhuangqh commented 2 years ago

already supported in this pr https://github.com/kubernetes-sigs/multi-tenancy/pull/1294 did you meet some bug in testing?

wondywang commented 2 years ago

Thanks @zhuangqh. I'll check it out. There is one case of feedback received so far.

Fei-Guo commented 2 years ago

already supported in this pr kubernetes-sigs/multi-tenancy#1294 did you meet some bug in testing?

This implementation is straightforward but there is a very tiny race window in theory such that tenant condition can be overwritten. If the readiness condition is updated by tenant at the time between CheckUWPodStatusEquality and vPod.Update(), the readiness condition can be overwritten by the vPod.Update() call. Syncer was not designed to handle all kinds of races, it uses periodic scanning to find the mismatches the fix them. The problem here is that once the readiness condition is removed accidentally, the tenant controller may not reconcile and recover it since it expects the condition cannot be removed once it is set. To eliminate the race, the tenant controller can notify syncer and let the syncer update the readiness conditions in the super cluster and reconcile for them.

Not sure if Kruise controller hits this race though.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-nested/issues/281#issuecomment-1326697553): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-api-provider-nested

VirtualCluster should support to sync specific conditions of PodStatus from tenant to super #281