Closed erikgb closed 1 year ago
I think our Condition
struct is compatible with the standard (see here) so that's a good start.
As for replacing the existing conditions, the issue I have is that HierarchyConfiguration
currently has one Condition that means "this entire namespace is in a bad state" (ActivitiesHalted
) and another that means "something's gone wrong but it's not a disaster" (BadConfiguration
). I think kstatus would lump these both into Failed
, and I don't particularly want to lose the existing distinction. But I'm totally on board with adding a kstatus Stalled
condition if either of the above are true. (I'd rather not use the Ready=true/false
condition since namespaces aren't exactly ever "ready").
Similarly, HNCConfiguration
has three conditions: BadConfiguration
, OutOfSync
and the poorly named NamespaceCondition
(there's a problem in a namespace in HNC). Similarly, I think kstatus would map the first two of these to Stalled
. The third one is more informational and doesn't have anything to do with the HNCConfiguration
objects itself so probably doesn't need a kstatus-compliant condition.
How does that sound?
I think it could make sense to keep the existing conditions when the HNC resources are not fully reconciled. AFAIK the most important requirements in kstatus seem to be:
status.observedGeneration
to be able to detect when the controller is aware of the resource state, and has started working with it.status.observedGeneration
equal to metadata.generation
and no conditions, will be considered ready and operational.I thought kstatus ignored any condition it didn't recognize. The only conditions is recognizes are Ready
, Stalled
and Reconciling
. Is that not correct? (If not, can you point me to the line of code where it looks for those?)
HNC doesn't actually know when anything finishes reconciling except SubnamespaceAnchor
, so we could add it there. So adding that would be a very large task, not just a minor API reporting convention change.
I'm ok with adding status.observedGeneration
of course.
The Ready
condition is a bit special, and I understand the other conditions/statuses as just recommendations. The most important aspect is noted in the docs:
The conditions defined in the library are designed to adhere to the "abnormal-true" polarity pattern (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties) , i.e. that conditions are present and with a value of true whenever something unusual happens. So the absence of any conditions means everything is normal. Normal in this situation simply means that the latest observed generation of the resource manifest by the controller have been fully reconciled with the actual state.
Understood, but the actual code (here and here) seems to ignore any condition except one of those three (except for K8s built-in objects).
Anyway, I'd be in favour of adding a Reconciling
condition to the Anchor while it's waiting for the subnamespace to be created, and to adding a Stalled
condition to the HierarchyConfiguration
and HNCConfiguration
(I'd also be ok with Ready: False
, but it seems like this is less preferred). I don't want to add a Reconciling
condition to the *Configuration
objects since we don't actually have the ability to set that accurately, at least not right now, and possibly not ever (the observedGeneration
trick doesn't help when the reconciler's actually working on a completely different object).
@adrianludwin I'll like to push this forward. This is what I suggest to do, in the following order (and as separate pull requests):
observedGeneration
on all HNC custom resources that are watched/reconciled.SubnamespaceAnchor
status, since conditions are missing completely now.Stalled
condition to HNCConfiguration
and HierarchyConfiguration
in case anything is wrong. This will complement the existing conditions - at least for now.When this is done, I think the resources are compatible with kstatus, and this issue can be closed.
But depending on if/when we can break the current status API, there are two tasks that I would suggest:
SubnamespaceAnchor
status.status
, and instead use the suggested sematics in kstatus.Yes, each reason has exactly one condition type, so I believe they could be removed. The bigger problem is that some of these conditions are basically warnings, and others are errors. Can kstatus represent that? Or can we add a "Warning" condition that gets ignored by kstatus (see my last comment from Feb 7)?
Are you ok with never being able to add a Reconciling
condition to HierarchyConfiguration
? We have no ability to set or unset that right now, and may never have that ability. E.g. is a namespace finished reconciling when we've guaranteed that the structure is correct, or when all objects have been propagated into it, or when all objects have been propagated out of it? And how do we feasibly implement that?
What does observedGeneration
mean in the context of HierarchyConfiguration
? An HC may need to be reconciled due to a change in a different namespace so just because observed generation == actual generation
doesn't mean that we're in sync. I mean, we can add it to make kstatus and Flux happy, but it won't give the assurances that I think it's supposed to give.
It might be good to collect all this in a Google doc we can iterate on rather than leaving it in a discussion format in this issue. wdyt?
... with all that said, adding all this to the existing API sgtm, and then when we next bump the API to v1beta1, we can drop the custom fields.
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Still valid, to support CI/CD tools like Flux and Argo.
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
As a user of tools like FluxCD, I would like Flux to know if/when my HNC resources are fully reconciled and operational (ready/healthy). Flux supports health assessment for all/most in-tree resource types, but it turns out that Flux also can support custom resources - if the resource expose a status compatible with kstatus.
I think this should be fairly easy to implement, and I would be happy to submit a PR to fix it. But we need to decide what to do with the current content of resource statuses? Can we delete something (breaking), or should we avoid breaking changes and keep duplicated status content with the same meaning?