k8ssandra / cass-operator

The DataStax Kubernetes Operator for Apache Cassandra
https://docs.datastax.com/en/cass-operator/doc/cass-operator/cassOperatorGettingStarted.html
Apache License 2.0
187 stars 66 forks source link

Revert label override values #691

Closed burmanm closed 1 week ago

burmanm commented 2 months ago

What is missing?

The original behavior was that cassandra.datastax.com/datacenter has a value that's identical to the CassandraDatacenter.ObjectMeta.Name. In PR #475 the behavior was changed to include CassandraDatacenter.DatacenterName() which could be either the .Name or .Spec.DatacenterName.

My proposal is to revert this part of the PR with basically the following fix in cass-operator:

diff --git a/apis/cassandra/v1beta1/cassandradatacenter_types.go b/apis/cassandra/v1beta1/cassandradatacenter_types.go
index ec9aa20..fc4bc2d 100644
--- a/apis/cassandra/v1beta1/cassandradatacenter_types.go
+++ b/apis/cassandra/v1beta1/cassandradatacenter_types.go
@@ -596,7 +596,7 @@ func (dc *CassandraDatacenter) SetCondition(condition DatacenterCondition) {
 // GetDatacenterLabels ...
 func (dc *CassandraDatacenter) GetDatacenterLabels() map[string]string {
        labels := dc.GetClusterLabels()
-       labels[DatacenterLabel] = CleanLabelValue(dc.DatacenterName())
+       labels[DatacenterLabel] = CleanLabelValue(dc.GetName())
        return labels
 }

Why is this needed?

This is because right now we're unable to use the labels for querying the parent objects. If we look at a pod/service/StatefulSet/any object when override name is in use it has a label like: cassandra.datastax.com/datacenter=overrideName yet using the Kubernetes API this label value is entirely useless. We can't fetch the actual CassandraDatacenter with this information at all.

It also does not necessarily match the value in Cassandra itself either as the value stored in the labels is cleaned up removing certain characters. Thus, both the user and the systems are unable to use these values for anything. And we need this feature, we need to be able to link resources from bottom to top so we can fetch the related resources. Right now only the top to bottom is doable since CassandraDatacenter knows what labels we created to sub resources, but sub resources do not know what labels the parts above (or next to) have.

Reverting this change requires some work also when updating since we do not allow StatefulSet updates automatically without user consent. We can however make an exception and update labels, since this doesn't require a rolling restart of the pods as all labels can be changed in Pods/StatefulSets without rolling update.

For k8ssandra-operator, we are in a more complex scenario. There are already some resources that use the .Name (like Prometheus ServiceMonitors) and some that use .DatacenterName(). This will require some manual upgrade process to enforce the correct format we want everywhere. While most of them should be automatically updated in the upgrade process, there are some that require manual work, namely the MedusaBackup objects as they were created with the old values and are not actively maintained.

The k8ssandra-operator also has the k8ssandra.io/cluster-name which is different from cassandra.datastax.com/cluster. The latter is always using the .Spec.ClusterNameand there's no need to change this behavior. However, for k8ssandra-operator we probably want to enforce the .Name instead. That gives us both values to use for cluster seeking operations.

One raised question was that the update has issues knowing which value we have in the datacenter label when updating from previous versions. This is always problematic no matter if we rollback or not, since there are already versions of cass-operator & k8ssandra-operator adding either value, depending on where you're updating from.

It might feel painful, but I think we should do the right thing here and revert the change to simplify our labels usage as we ourselves can't even keep track of what values are stored where. If we always keep the Kubernetes names, it's very clear. It's also what we internally always process, whatever is in Cassandra is not really relevant to our management processes.

┆Issue is synchronized with this Jira Story by Unito ┆Reviewer: Miles Garnsey ┆Fix Versions: 2024-10,2024-11 ┆Issue Number: CASS-4

adejanovski commented 2 months ago

namely the MedusaBackup objects as they were created with the old values and are not actively maintained

I don't think that's really a problem, those labels aren't used as selectors I think.

@Miles-Garnsey @olim7t, please chime in here.