Implement "pause/stopped" functionality in all operators

lfrancke commented 3 years ago

Some Kubernetes operators out there have a pause field in their spec which just skips reconciliation no matter what, others are using an annotation. Different methods (via annotations or labels) are viable. We can go with a CRD field as well.

We want the same feature, and will be moving forward with adding a new CRD field for this functionality.

- [x] Propose CRD change and get it approved
- [x] https://github.com/stackabletech/druid-operator/issues/418 Implement POC of pause functionality in a single operator
- [x] https://github.com/stackabletech/operator-rs/issues/570 Move as much reusable code (CRD, logic) for pause handling as possible to operator-rs
- [x] https://github.com/stackabletech/documentation/pull/376 Add concept docs for cluster operations
- [x] https://github.com/stackabletech/issues/issues/366 Roll out pause functionality across all operators

Examples:

TODO

Framework

[x] https://github.com/stackabletech/operator-rs/pull/573

Products

Internal

[x] Commons -> Stopping restarting pods from a paused cluster would be nice (complicated).
[x] Listener -> Not in this release.
[x] https://github.com/stackabletech/opa-operator/pull/430
[x] Secret -> does not make sense

soenkeliebau commented 1 year ago

I have put some shower thoughts into this and roughed it out in code that may or may not compile or work :)

In operator rs have something like this:

use std::collections::BTreeMap;

/// The name of the annotation that controls whether or not a cluster may be reconciled at this time
pub const PAUSE_RECONCILIATION_ANNOTATION: &str =
    concatcp!(APP_KUBERNETES_LABEL_BASE, "pause-reconciliation");

pub fn reconciliation_paused<T>(obj: &T) -> bool
where
    T: Resource,
{
    if let annotations = obj.meta().annotations {
        annotations.get(PAUSE_RECONCILIATION_ANNOTATION)
            .map_or(false, |value| value.to_lowercase() == "true")
    } else {
        false
    }
}

And in the operator at the begin of the reconcile fn:

// Check if the CRD has the annotation to pause reconciliaton set to true and abort
if reconciliaton_paused(&hdfs) {
    tracing::info!("Reconciliation for this cluster has been paused, aborting ..");
    return Ok(Action::await_change());
}

maltesander commented 1 year ago

We have in most operators (we saw e.g. HDFS does not have it) a top-level stopped boolean field that would just set the statefulset replicas to zero. This would be removed then?

lfrancke commented 1 year ago

I think stopped and pause reconciliation are two different things. What do you think @soenkeliebau ?

soenkeliebau commented 1 year ago

That is something different, this would stop the operator from making any changes to the cluster, not stop it.

maltesander commented 1 year ago

Ah i get it. Yeah that is something different and worthwhile for debugging :)

soenkeliebau commented 1 year ago

Unless I missed the point completely with my code also a fairly low hanging fruit.

What we could at some point consider doing is to have all operators call a "preflight" method in the operator framework as first step and that can do stuff like this. That would enable us to roll out new functionality without any changes in the operators themselves (apart from updating the framework obviously).

Not sure if its worthwhile though ..

maltesander commented 1 year ago

I did not check the code, i misunderstood the issue. We should have this functionality.

I like the preflight idea, and we recently talked about splitting the reconcile more into parts like resolve data, build resources, apply resources etc. Would fit in nicely there?

soenkeliebau commented 1 year ago

That would fit in that concept quite nicely I think, yes.

soenkeliebau commented 1 year ago

One thing that just occurred to me is that we are inconsistent with the stopped functionality here. That is a flag in our CRDs, whereas here we use an annotation. Annotations are probably preferrable, as they don't require CRD changes and can be rolled out to all operators fairly easily - not sure though..

maltesander commented 1 year ago

Yeah not touching the crd makes sense. Stopped should be easily transformed to an annotation though.

soenkeliebau commented 1 year ago

We could even go a step further and implement the "stop" functionality in the framework, as it can select the necessary statefulsets and deployments via labels.

lfrancke commented 1 year ago

I'll add this ticket to the board now but I consider it an epic which means: Please refine this epic, check if the steps make sense that Sönke has described and please create an issue in operator-rs and one operator and link them in the tasklist above.

When that has happened I will remove this issue from the board and will prioritize the subtasks instead.

fhennig commented 1 year ago

I think annotations for both makes sense, as both a "meta", and not related to the product spec.

I think I'd like a framework function where you just pass in your XyCluster as a Resource and the framework then retrieves the annotation from there and returns true/false whether you should reconcile or not. This way the operator doesn't need to know or care about the name of the annotation or it's value or even if an annotation is involved or not. But control flow would still be in the operator, which I think is better in our case than moving control flow into the framework.

I've said the opposite in the past, but since we're still figuring stuff out I thing it's better like this.

vsupalov commented 1 year ago

Reviving the conversation:

I agree with the "annotation instead of crd" direction. Annotations are for tooling information, and we are in good company with annotation choice judging by the links.
The code Sönke proposed looks like a good first iteration. We could wrap it in a preflight call. I'd only change the key to "stackable.tech/pause-reconciliation".
The restructuring PR would not make this easier or more convenient. Which is good news - the less dependencies the easier to implement.

I am missing one final piece of information: what use-case do we want to pick, so we can judge if the implemented pause feature works as expected?

Once that is resolved, we can try pausing one operator of choice, move the functionality to operator-rs and roll it out by adding ~4 lines to all operators.

sbernauer commented 1 year ago

I think one of the main pain points we want to address with this feature is:

Have two hdfs clusters deployed
Upgrade the operator 23.1 -> 23.4
Have both hdfs cluster be restarted simultaneous -> Bad

Instead it should be

Have multiple hdfs clusters deployed
Mark hdfs b as paused
Upgrade the operator 23.1 -> 23.4
hdfs cluster a restarts
Unmark hdfs b as paused
hdfs cluster b restarts
Profit

That's just one scenario that came up to my mind. Feel free to replace hdfs with any product of choice

soenkeliebau commented 1 year ago

Picking up the annotation vs crd point again for a second, I'd personally prefer making these fields part of the CRD itself, as this would give us benefits like auto completion in IDEs, proper validation when applying to k8s, and enforce proper schema evolution on us.

Annotations would be less overhead, that is true, but they offer no assistance against stuff like:

Typos in the annotation, this could be applied no problem but would not be picked up
No support for more complex datatypes (like boolean for stopped)
If we decide to change an annotation or parse it differently this is not reflected in the crd version and would not trigger a conversion

Yes, putting this in the CRD itself would make us jump a little higher, but I personally consider this a good thing, because it will enforce proper versioning practice, instead of just yoloing it in a hashmap (overstating it a bit here :) ).

vsupalov commented 1 year ago

I see the benefits of adding a field to the CRD. It should make it harder to get wrong with all the checks we have in place.

I adjusted the epic description, and added a CRD proposal to the architecture meeting agenda.

lfrancke commented 1 year ago

Did you discuss this? Can we move this forward?

vsupalov commented 1 year ago

The CRD change proposal was discussed and adjusted. There was no strong consensus whether a CRD change or an annotation is the way to go, but I would propose to move forward with the CRD option and create a POC.

lfrancke commented 1 year ago

Official decision (by myself): We will go forward with the CRD approach and not consider annotations any further for now.

vsupalov commented 1 year ago

Status update

Some blocking issues and questions have come up, which need to be clarified before this epic can be worked on.

1) Blocker: Coupling to status epic. Does the reconciliationPaused flag mean that we don't touch the resource at all, or do we still want to update the status fields? If the fields need to be reliable, that would make the status epic blocking for this one, and we would need to consider the requirement here. (cc @maltesander @razvan)

2) Concern: breaking changes in CRD. A breaking change in our CRD for an operator upgrade could lead to the paused CR not being parsed successfullly, flooding the operator logs with errors, and effectively circumventing the pause CRD field. Is this something we are okay with? If not, it looks like we could avoid parsing of the CR by checking for a reconciliationPaused annotation. (brough up and well articulated by @siegfriedweber)

3) Question: do orphaned resources get cleaned up in paused state? This connects to the point above, but is something of an edge case. (brought up by @razvan)

Clarification of the above points are necessary to inform further work on this epic. (cc @lfrancke @soenkeliebau ). It seems like it can be finished until the next release in all cases.

Epic ownership

As I will be on vacation next week, and the work on this epic needs to be finished by the next release, Razvan will take over the role of the epic owner so we can make sure that progress on this topic happens as necessary.

maltesander commented 1 year ago

After a another discussion with @razvan and a good point by @siegfriedweber we came up with a mixed solution in terms of labels and CRD:

Detached (@siegfriedweber )

This will be done in the watch (main.rs) for the CRD via a fixed label from the operator-rs like "stackable.tech/operation: detached" and the ListParams:

let controller = Controller::new(
    some_api,
    ListParams {
        label_selector: Some("stackable.tech/operation != detached".to_string()),
        ..ListParams::default()
    },
)
.owns(...)
.owns(...);

This will result in not reconciling at all. This is implemented very easily and is non breaking.

ReconcilePaused

This will be done via the same label (but different value) as Detached "stackable.tech/operation: reconcilePaused".

Here we stop reconciling the "spec" of the CRD but still update the cluster status etc. This would be non breaking as well.

The name could be adapted then probably.

ClusterStopped

This is basically a setting that can be achieved already by just setting replicas to 0 in the custom resource (or use a bad nodeselector for daemonsets etc.)

So this should probably just be a top level setting in the CRD. This will be a breaking change in terms of an update of the CRD is required to use that flag (would be non breaking if that flag is not to be used).

vsupalov commented 1 year ago

EDIT: I did not consider apiVersion while writing this, you can disregard the text below. It only applies if we have incompatible CRDs between releases which have the same apiVersion.

A point which just came up in a discussion: the user story which motivated this epic, implies the need for CRs which were written for a previous release of our platform, to be functional in the next release.

Otherwise, applying a new CRD version after pausing the operator would need to either

1) fail, or 2) make existing CRs invalid due to missing required fields for example.

(I'm currently not sure what happens when a CRD is applied which is not compatible with existing CRs.)

Versioned CRDs would resolve this. Otherwise we need to discuss ensuring a guarantee that CRs made for one release (23.1) will always remain functional with CRDs from the following release (23.4). This is the only way to provide a smooth upgrade path as described in the user story which Sebastian sketched out above.

sbernauer commented 1 year ago

Done 🚀

stackabletech / issues