k8ssandra / k8ssandra-operator

The Kubernetes operator for K8ssandra
https://k8ssandra.io/
Apache License 2.0
170 stars 78 forks source link

Deletion of Stargate or Reaper should be done earlier in reconciliation #374

Open jsanda opened 2 years ago

jsanda commented 2 years ago

What is missing? If I remove Stargate or Reaper from the cluster or a DC, the cleanup and deletion of underlying resources can potentially take a while to happen. Here is a brief summary of the orchestration workflow in the K8ssandraCluster controller.

We first check to see if the K8ssandraCluster has been deleted. If it has, proceed with finalizer cleanup.

Next we reconcile a number of secrets depending on what components are enabled. The secrets include:

Then we move on to reconcile CassandraDatacenters. We iterate over all of the DCs. First, we check to see if the DC has been removed from the K8ssandraCluster. If it has, we proceed with deletion and cleanup. After the DC is reconciled (i.e., either its Ready or Stopped condition is true), we perform some schema checks. Then we move on to the next DC. This is a rather abbreviated version of what is involved.

It should also be noted that several places along the way can result in a requeue. This important to keep in mind as it will further delay reconciling other components. Stargate and Reaper gets reconciled only after all CassandraDatacenters have been reconciled. We handle create/update/delete all at the same point in the orchestration. Deletions of Stargate and Reaper should be handled soon, definitely before we reconcile the CassandraDatacenters.

Why do we need it? Suppose the user wants to undeploy Reaper or Stargate because it is causing some instability in the cluster, or a security risk has been discovered, or because of billing concerns with a cloud provider. Now let's say we are adding a DC. Depending on the number of C* nodes in the DC and the amount data in the cluster, it could take a while (easily several hours) before the new CassandraDatacenter is ready. Removing Stargate and/or Reaper on the other hand is relatively fast and easy.

Just as we first check for and reconcile deletion of the K8ssadraCluster, we should also reconcile deletion of child objects early on in the reconcile process.

Environment

v0.4.0

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-132

jsanda commented 2 years ago

Please add your planning poker estimate with ZenHub @adutra

adutra commented 2 years ago

Sorry for the late reply, estimate done.