k8ssandra / k8ssandra-operator

The Kubernetes operator for K8ssandra
https://k8ssandra.io/
Apache License 2.0
176 stars 79 forks source link

K8SSAND-824 ⁃ K8ssandraCluster status should be aggregated and summarized by the operator #88

Closed jdonenine closed 6 months ago

jdonenine commented 3 years ago

What is missing? It would be useful for the underlying status of the resources managed by k8ssandra-operator to be distilled and summarized as opposed to being a complete dump of the underlying resource status.

To achieve this we need to better understand the statuses exposed by the underlying resources and correlate those to higher-level status.

An outcome of this task should be a definition and documentation of the statuses from those resources we will use, and a mapping of what statuses those would combine to create at the top-level.

Why do we need it? To make it easier to understand the overall status of a K8ssandra deployment.

Proposed solution See https://github.com/k8ssandra/k8ssandra-operator/issues/88#issuecomment-936677047

┆Issue is synchronized with this Jira Task by Unito ┆epic: Status Summary Enhancements ┆fixVersions: k8ssandra-operator-v1.1.0 ┆friendlyId: K8SSAND-824 ┆priority: Medium

jdonenine commented 3 years ago

This is an ask that spun out of #71

jsanda commented 3 years ago

Here is what the status currently looks like:

status:
  dc1:
    cassandra:
      cassandraOperatorProgress: Ready
      conditions:
        - lastTransitionTime: "2021-08-03T13:46:00Z"
           message: ""
           reason: ""
           status: "False"
           type: ScalingUp
      lastServerNodeStarted: "2021-08-03T13:45:24Z"
      nodeStatuses:
        test-dc1-default-sts-0:
          hostID: 5a530a24-d72d-412f-82c9-a304e0c03c37
      observedGeneration: 1
      quietPeriod: "2021-08-03T13:46:10Z"
      superUserUpserted: "2021-08-03T13:46:05Z"
      usersUpserted: "2021-08-03T13:46:05Z"
    stargate:
      availableReplicas: 1
      conditions:
        - lastTransitionTime: "2021-08-03T13:47:00Z"
          status: "True"
          type: Ready
      deploymentRef: test-dc1-stargate-deployment
      progress: Running
      readyReplicas: 1
      readyReplicasRatio: 1/1
      replicas: 1
      serviceRef: test-dc1-stargate-service
      updatedReplicas: 1

CassandraDatacenter The CassandraDatacenter API defines a number of condition types:

Instead of reporting these verbatim for each DC we could condense these down in the K8ssandraCluster status. Initialized, Ready, Stopped and possibly Resuming could be covered by a single Ready condition. If the Ready condition is false, you could then go query the status of that particular CassandraDatacenter to see why exactly.

ReplacingNodes, ScalingUp, ScalingDown, Updating, and RollingRestart could all be covered by a single Updating condition.

I only see Valid used in one place. When a node is decommissioned cass-operator first performs a check to see if there is enough capacity in the cluster to absorb the data from the node. If there isn't enough space the condition is set to false. I don't think we need to worry about this one.

We will need to add a Decommissioning type when we add support for decommissioning DCs, but we don't need it right now.

Stargate Stargate defines a single Ready condition currently that we should include.

Super user replicated secret The operator creates a ReplicatedSecret object that points to the superuser secret. The ReplicatedSecret specifies to which clusters the actual secret should be replicated. It has a single condition type named Replication that specifies the cluster where the secret has been replicated.

For now we may want to include the Replication condition, but I think it should be temporary. I think we need to change cass-operator to make the superuser creation optional. It only needs to be done by the first DC. When that change is made it might be sufficient to just report that the super user has been created. If we still need to replicate the secret, then we should keep the replication condition

jsanda commented 3 years ago

After giving it some more thought, an Initialized condition makes sense. The operations involved for adding a DC to an existing cluster are different from those for creating a new cluster. An Initialized condition will help this.

jsanda commented 3 years ago

Proposal

The status should provide a summary of the latest observed state of the K8ssandraCluster. With the minimum information necessary, the status should provide insight into the state of the K8ssandraCluster. Ideally the user should be able to easy determine if the K8ssandraCluster is an healthy state (i.e., all green), updating, or being deleted. A set of conditions should be included to provide further context.

Because a K8ssandraCluster is a a heterogeneous mix of components that may exist across multiple Kubernetes clusters the status should provide some additional insights. I am going to introduce the term K8ssandraDatacenter (or kdatacenter for short) here to avoid confusion with a CassandraDatacenter. A kdatacenter is a logical grouping of a CassandraDatacenter, Stargate, Reaper (when we add it), and any other related resources that the operator creates and or manages. K8ssandraDatacenters can exist across multiple Kubernetes clusters as well as within the same cluster in different namespaces.

The K8ssandraCluster status should provide visibility into the latest observed state of each kdatacenter. It should include:

Let's look at an example:

status:
  phase: Updating
  conditions:
  - type: Initialized
    status: "False"
  - type: Ready
    status: "False"
  dc1:
    namespace: dev
    k8sContext: east
    cassandra:
      phase: Running
    stargate:
      phase: Running
  dc2:
    namespace: dev
    k8sContext: central
    cassandra:
      phase: Running
    stargate:
      phase: Running
  dc3:
    namespace: dev
    k8sContext: west
    cassandra:
      phase: Running
    stargate:
      phase: NotRunning
    failureReason: SomeStargateFailureReason
    failureMessage: "Stargate failed because..."

Details

Phase

The phase is basically represents a state machine of the state of the object. It is primarily intended for human consumption as a means to quickly and easily determine the overall the state of the object. Possible values include Running, Updating, Terminating, and probably more.

Conditions

The phase by itself does not necessarily give you the complete picture. This is where conditions come into play. Condition types might include:

Initialized Indicates that the K8ssandraCluster has been created and that the operator has completed its initial reconciliation work. This is needed because we to differentiate between adding a DC to a new cluster vs an existing cluster. The operations involved are different.

Ready The cluster has been initialized and is ready for use.

Terminating The cluster is being deleted.

ScalingUp Adding a K8ssandraDatacenter.

ScalingDown Removing a K8ssandraDatacenter

Updating There is an update happening in more or more K8ssandraDatacenters. This could be a configuration change to an object which causes a StatefulSet update, scaling up/down of Stargate or the CassandraDatacenter, and more.

K8ssandraDatacenter map

Keyed by the CassandraDatacenter name. As mentioned previously a K8ssandraDatacenter is a logical grouping. K8ssandraDatacenters can exist different Kubernetes clusters or in the same cluster, or both.

namespace The namespace in which the objects are deployed.

k8sContext The Kubernetes cluster where the objects are deployed. If omitted this would be the local cluster, i.e., the same cluster in which the K8ssandraCluster object exists.

cassandra Added when the CassandraDatacenter is created. The phase is similar to what was described previously.

stargate Added when the Stargate is created. Removed when the Stargate is terminated. The phase is similar to what was described previously.

failureReason An error code added from the most recent reconciliation primarily intended for use by the operator

failureMessage A human readable error message describing the error from the most recent reconciliation in the K8ssandraDatacenter

jdonenine commented 3 years ago

This is nice, gives a good visual example to discuss.

A few questions:

  1. You've organized here around the idea of the k8ssandra datacenter - dc1, dc2, dc3... Would this level have additional details provided for it via "conditions" or is that only at the full cluster level?

  2. How do you imagine that other future components would be represented here, namely reaper and medusa -- would be good to have a example of what we think the future might look like as well, would it be like this?

  dc2:
    namespace: dev
    k8sContext: central
    cassandra:
      phase: Running
    stargate:
      phase: Running
    medusa:
      phase: Running
    reaper:
      phase: Running

Does that imply that we would always consider other resources "attached" to a datacenter?

  1. Are failureReasons associated with the kdatacenter or with a paritcular component within that? The spacing makes it seem like it's at the kdatacenter level, want to be sure that's the intent - the example content seems more specific to a certain component perhaps?
dc3:
    namespace: dev
    k8sContext: west
    cassandra:
      phase: Running
    stargate:
      phase: NotRunning
    failureReason: SomeStargateFailureReason
    failureMessage: "Stargate failed because..."

I'd say it makes sense at the kdatacenter level if it's context for the overall state of the kdatacenter. But that begs the question, should there be something representing the aggregate status of a kdatacenter?

jsanda commented 3 years ago

You've organized here around the idea of the k8ssandra datacenter - dc1, dc2, dc3... Would this level have additional details provided for it via "conditions" or is that only at the full cluster level?

I was not thinking of providing additional details, only enough information to know where to dig deeper if necessary.

jsanda commented 3 years ago

How do you imagine that other future components would be represented here, namely reaper and medusa -- would be good to have a example of what we think the future might look like as well, would it be like this?

Yes, I envision Reaper being added similarly, not necessarily Medusa though since it runs inside the Cassandra pod.

Does that imply that we would always consider other resources "attached" to a datacenter?

I don't think so. We may introduce something in K8ssandraCluster that only exists in the control plane. If it needed to be included in the status I wouldn't expect it to be attached to a datacenter.

jsanda commented 3 years ago

Are failureReasons associated with the kdatacenter or with a paritcular component within that? The spacing makes it seem like it's at the kdatacenter level, want to be sure that's the intent - the example content seems more specific to a certain component perhaps?

They are associated with the kdatacenter and can apply to any component within. The goal is to make it easy to see 1) if there was an error and 2) where that error occurred.

I'd say it makes sense at the kdatacenter level if it's context for the overall state of the kdatacenter. But that begs the question, should there be something representing the aggregate status of a kdatacenter?

I am on the fence about it. If we had a K8ssandraDatacenter custom type with a controller managing instances, then I would say definitely yes. We don't have that though.

sync-by-unito[bot] commented 2 years ago

➤ Jeff DiNoto commented:

We discussed this a bunch more this morning and ended in a place favoring the “Phase” summary approach discussed in the comments previous to now. We got here because there is no safe way to make assumptions about the underlying resources beyond the operational status that is exposed by each resource – for example, from the data we have from cass-operator and the CassandraDatasource we can’t tell if the DB itself is available from an application perspective, so we won’t try to represent that.

What we can tell is what operations the k8s resource is going through – that’s what we’ll summarize and make sense of for operators.

The CassandraDatacenter is the best example of this because it’s the most complicated resource – there are many condition status types in play, if we just dump it and forget it we’d be asking the user to understand all of those. Instead we will aggregate those and ask the user to understand only the “phases” at the top level.

Our resulting structure would look something like this (exact terms will need to be nailed down):

K8ssandraCluster

jsanda commented 2 years ago

In my above comment I listed the CassandraDatacenter status conditions. As mentioned previously, I don't think we need to worry about the Valid condition.

Initialized is set true once and remains true for the duration of the lifetime of the CassandraDatacenter. I think this is somewhat of an implementation detail. From the user's perspective, the Ready condition is what matters. Initialized gets set at the same time/place that Ready does.

This leaves us with the remaining status conditions:

All of the conditions other than Ready and Stopped represent some sort of operation or maintenance being performed. Only one of those conditions can be true at any given point. With that in mind the phases for a CassandraDatacenter could be as follows:

Any of the phases that involve some operation, e.g., ScalingUp, also imply Running.

jsanda commented 2 years ago

For K8ssandraCluster conditions we already Initialized which is really more of an implementation detail. Similar to cass-operator does we probably want a Ready condition. We also need conditions for scaling up/down. The scaling here though refers to the additional or removal of CassandraDatacenter vs a single C* pod.

It could be useful to have conditions indicating if there is a backup or restore operation in progress. This could be done at a later time.

jdonenine commented 2 years ago

That makes sense @jsanda thanks for the follow up info. One question on the CassandraDatacenter conditions... what does Resuming represent?

jsanda commented 2 years ago

Resuming is set true when the CassandraDatacenter is scaling back up after it had been stopped. Starting might be more intuitive, but honestly I think we can do without it because when Resuming goes back to false, Ready goes to true. I think Ready is sufficient.

jdonenine commented 2 years ago

@jsanda or @burmanm what will the conditions look like in the CassandraDatacenter when adding or decomissioning operations are taking place? We've got #268 that brings in the requirements to represent those states, but we didn't really talk about how that would be realized from the underlying status from cass-operator?

jsanda commented 2 years ago

With https://github.com/k8ssandra/cass-operator/pull/243 a couple things were introduced. First, there in a new CassandraTask type. Its spec includes a datacenter field, and its status include fields to determine success/failure of the job.

The CassandraDatacenter status has a new trackedTasks array where each element is just a reference to a running task. I believe once a task completes cass-operator removes it from the array. I can try to put together a yaml example if you'd like.

For the K8ssandraCluster status, we need a few things. First we need two additional phases for Cassandra, e.g., rebuilding and decommissioning.

We a status condition for the K8ssandraCluster itself to indicate if a dc is being added. We need this in addition to phase because there is other work that k8ssandra-operator performs besides the rebuild. Similarly, it needs a status condition for a dc being removed. Arguably, these status conditions should be added in #289.

jsanda commented 2 years ago

Proposed structure for aggregated status in K8ssandraCluster:

dc2:
    cassandra:
      phase: Running
    stargate:
      phase: Running
    reaper:
      phase: Running
    k8ssandra:
      phase: Running   
adutra commented 2 years ago

I just started working on this and wanted to get your early feedback before proceeding:

Root-level status struct

type K8ssandraDatacenterStatusMap map[string]K8ssandraDatacenterStatus

type K8ssandraClusterStatus struct {

    // +optional
    Conditions []K8ssandraClusterCondition `json:"conditions,omitempty"`

    // A map of K8ssandraDatacenterStatus keyed by datacenter name.
    K8ssandraDatacenterStatusMap `json:",inline,omitempty"`
}

K8ssandraDatacenterStatus struct

This is the map value in each entry in a K8ssandraDatacenterStatusMap. It contains status for all components in the "K8ssandraDatacenter".

type K8ssandraDatacenterStatus struct {

    // +optional
    Cassandra *CassandraStatus `json:"cassandra,omitempty"`

    // +optional
    Stargate *StargateStatus `json:"stargate,omitempty"`

    // +optional
    Reaper *ReaperStatus `json:"reaper,omitempty"`

    // An error code added from the most recent reconciliation primarily intended for use by the operator.
    // +optional
    FailureReason *FailureReason `json:"failureReason,omitempty"`

    // A human-readable error message describing the error from the most recent reconciliation in this datacenter.
    // +optional
    FailureMessage *string `json:"failureMessage,omitempty"`
}

type CassandraStatus struct {
    Phase CassandraPhase `json:"phase,omitempty"`
}

type StargateStatus struct {
    Phase stargateapi.StargatePhase `json:"phase,omitempty"`
}

type ReaperStatus struct {
    Phase reaperapi.ReaperPhase `json:"phase,omitempty"`
}

Conditions vs Phases

I read carefully the comments above and unfortunately it's not always clear to me whether we are talking about conditions or phases.

I went with the following assumptions:

Stargate and Reaper phases

Phases for Reaper and Stargate are imported from their respective API phases. Note that they used to be called "Progress" before, I changed the names for consistency. Everything is called "Phase" now.

Cassandra phases

These phases are for individual CassandraDatacenter status inside a K8ssandraDatacenter. They simply map 1x1 to cass-operator's DatacenterConditionType values, except for the Valid one that we don't use.

type CassandraPhase string

const (
    CassandraPhaseReady          CassandraPhase = "Ready"
    CassandraPhaseInitialized    CassandraPhase = "Initialized"
    CassandraPhaseReplacingNodes CassandraPhase = "ReplacingNodes"
    CassandraPhaseScalingUp      CassandraPhase = "ScalingUp"
    CassandraPhaseScalingDown    CassandraPhase = "ScalingDown"
    CassandraPhaseUpdating       CassandraPhase = "Updating"
    CassandraPhaseStopped        CassandraPhase = "Stopped"
    CassandraPhaseResuming       CassandraPhase = "Resuming"
    CassandraPhaseRollingRestart CassandraPhase = "RollingRestart"
)

Root-level condition types

Root-level condition types are defined according to John's comment. Again, they apply to the entire cluster.

Note that ConditionInitialized already exists (with a different name, I'm trying to homogenize const names).

type K8ssandraClusterConditionType string

const (
    // ConditionInitialized is set to true when the Cassandra cluster becomes ready for the first time.
    // During the lifetime of the C* cluster CassandraDatacenters may have their readiness condition change back and
    // forth. Once set, this condition however does not change.
    ConditionInitialized K8ssandraClusterConditionType = "Initialized"

    // The cluster has been initialized and is ready for use.
    ConditionReady K8ssandraClusterConditionType = "Ready"

    // The cluster is being deleted.
    ConditionTerminating K8ssandraClusterConditionType = "Terminating"

    // Adding a K8ssandraDatacenter.
    ConditionScalingUp K8ssandraClusterConditionType = "ScalingUp"

    // Removing a K8ssandraDatacenter
    ConditionScalingDown K8ssandraClusterConditionType = "ScalingDown"

    // There is an update happening in more or more K8ssandraDatacenters. This could be a configuration change to an
    // object which causes a StatefulSet update, scaling up/down of Stargate or the CassandraDatacenter, and more.
    ConditionUpdating K8ssandraClusterConditionType = "Updating"
)

I struggle to see when a condition like ConditionTerminating could be useful, but I don't mind adding them.

@jsanda is that a good starting point?

adutra commented 2 years ago

@jsanda I just noticed this in your example:

dc2:
    [...]
    k8ssandra:
      phase: Running   

What does k8ssandra represent here?

jsanda commented 2 years ago

One thing I wonder about with K8ssandraDatacenterStatus would be good to have a failure reason/message per component? I don't think it is necessary curious to see what you think.

These phases are for individual CassandraDatacenter status inside a K8ssandraDatacenter. They simply map 1x1 to cass-operator's DatacenterConditionType values, except for the Valid one that we don't use.

As mentioned in some of my earlier comments, I question whether we need to include all of these condition types. It is probably the easiest thing to do and will be familiar to cass-operator users.

Just to make sure we're all on the same page let's talk about a particular scenario - scaling up. When a CassandraDatacenter is scaling up, both its Ready and ScalingUp conditions will be true. I would expect the CassandraStatus.Phase field to be set to CassandraPhaseScalingUp

For the root level condition types, what is meant by scaling up? I assume it means the addition of any component. Furthermore, I guess it could also mean scaling up a particular component, e.g., increase C* or Stargate nodes.

I agree about Terminating. I suppose if something goes wrong with the finalizer bit, it could be helpful. I can live without it though.

We might want a condition for secrets replication. Just something to think about.

What does k8ssandra represent here?

I think the idea was to provide some status for things that are not covered by the components. Telemetry is one example. Secret replication is another.

is that a good starting point?

yup :)

adutra commented 2 years ago

One thing I wonder about with K8ssandraDatacenterStatus would be good to have a failure reason/message per component? I don't think it is necessary curious to see what you think.

Since the DC is a natural boundary for Cassandra clusters, I'd go with a dc-level failure reason/message, at least as a starting point.

adutra commented 2 years ago

Just to make sure we're all on the same page let's talk about a particular scenario - scaling up. When a CassandraDatacenter is scaling up, both its Ready and ScalingUp conditions will be true. I would expect the CassandraStatus.Phase field to be set to CassandraPhaseScalingUp

👍

The tricky part here is that we'll need an algorithm to convert from conditions to phases (the former can overlap, the latter cannot), but I'll sort that out.

adutra commented 2 years ago

For the root level condition types, what is meant by scaling up? I assume it means the addition of any component. Furthermore, I guess it could also mean scaling up a particular component, e.g., increase C* or Stargate nodes.

Agreed.

I agree about Terminating. I suppose if something goes wrong with the finalizer bit, it could be helpful. I can live without it though.

👍 , will leave it out for now.

We might want a condition for secrets replication. Just something to think about. I think the idea was to provide some status for things that are not covered by the components. Telemetry is one example. Secret replication is another.

👍

adutra commented 2 years ago

Telemetry is one example. Secret replication is another.

One caveat though: while telemetry is indeed deployed per-dc, and a per-dc status+phase could be added as you suggested, secret replication is not. I can add a top-level condition for secret replication instead.

adutra commented 2 years ago

@jsanda a question for you:

I'm struggling to adapt existing code to the new design when the existing code attempts to check the status of a cassdcapi.DatacenterConditionType stored in K8ssandraCluster.Status. Since we don't store anything from the CassandraDatacenter anymore there.

A typical example is :

for _, dc := range kc.Spec.Cassandra.Datacenters {
    if status, found := kc.Status.Datacenters[dc.Meta.Name]; found && 
// FIXME there is no condition in status.Cassandra
status.Cassandra.GetConditionStatus(cassdcapi.DatacenterReady) == corev1.ConditionTrue {
        // do something with the dc template
    }
}

What do you suggest as a replacement? Fetch the CassandraDatacenter resource each time we need to check its conditions?

jsanda commented 2 years ago

I am using the conditions even more with my work for removing a DC. The overhead for fetching it should be minimal since it's only hitting the cache. Alternatively, what do you think about some sort of request scoped var?

adutra commented 2 years ago

Alternatively, what do you think about some sort of request scoped var?

Could you elaborate on what you would store there? The contents of actualDcs?

One thing that comes to mind is that kc.Status.Datacenters can potentially contain info set by previous requests. E.g. we could have reconciled dc1 and dc2 and now we are re-building dc1; but kc.Status.Datacenters would still contain an entry for dc2, whereas the request-scoped var probably wouldn't.

jsanda commented 2 years ago

Could you elaborate on what you would store there? The contents of actualDcs?

Yeah, along with the K8ssandraCluster and logger.

we could have reconciled dc1 and dc2 and now we are re-building dc1; but kc.Status.Datacenters would still contain an entry for dc2, whereas the request-scoped var probably wouldn't.

If we are rebuilding dc1 that means we have already reconciled dc2 in the current request, assuming the order for reconciliation is dc2 then dc1. If dc2 is not ready, then we requeue and will start over in a later request.