Closed lfrancke closed 1 year ago
I have put some shower thoughts into this and roughed it out in code that may or may not compile or work :)
In operator rs have something like this:
use std::collections::BTreeMap;
/// The name of the annotation that controls whether or not a cluster may be reconciled at this time
pub const PAUSE_RECONCILIATION_ANNOTATION: &str =
concatcp!(APP_KUBERNETES_LABEL_BASE, "pause-reconciliation");
pub fn reconciliation_paused<T>(obj: &T) -> bool
where
T: Resource,
{
if let annotations = obj.meta().annotations {
annotations.get(PAUSE_RECONCILIATION_ANNOTATION)
.map_or(false, |value| value.to_lowercase() == "true")
} else {
false
}
}
And in the operator at the begin of the reconcile fn:
// Check if the CRD has the annotation to pause reconciliaton set to true and abort
if reconciliaton_paused(&hdfs) {
tracing::info!("Reconciliation for this cluster has been paused, aborting ..");
return Ok(Action::await_change());
}
We have in most operators (we saw e.g. HDFS does not have it) a top-level stopped
boolean field that would just set the statefulset replicas to zero. This would be removed then?
I think stopped
and pause reconciliation
are two different things.
What do you think @soenkeliebau ?
That is something different, this would stop the operator from making any changes to the cluster, not stop it.
Ah i get it. Yeah that is something different and worthwhile for debugging :)
Unless I missed the point completely with my code also a fairly low hanging fruit.
What we could at some point consider doing is to have all operators call a "preflight" method in the operator framework as first step and that can do stuff like this. That would enable us to roll out new functionality without any changes in the operators themselves (apart from updating the framework obviously).
Not sure if its worthwhile though ..
I did not check the code, i misunderstood the issue. We should have this functionality.
I like the preflight idea, and we recently talked about splitting the reconcile more into parts like resolve data, build resources, apply resources etc. Would fit in nicely there?
That would fit in that concept quite nicely I think, yes.
One thing that just occurred to me is that we are inconsistent with the stopped
functionality here. That is a flag in our CRDs, whereas here we use an annotation. Annotations are probably preferrable, as they don't require CRD changes and can be rolled out to all operators fairly easily - not sure though..
Yeah not touching the crd makes sense. Stopped
should be easily transformed to an annotation though.
We could even go a step further and implement the "stop" functionality in the framework, as it can select the necessary statefulsets and deployments via labels.
I'll add this ticket to the board now but I consider it an epic which means: Please refine this epic, check if the steps make sense that Sönke has described and please create an issue in operator-rs and one operator and link them in the tasklist above.
When that has happened I will remove this issue from the board and will prioritize the subtasks instead.
I think annotations for both makes sense, as both a "meta", and not related to the product spec.
I think I'd like a framework function where you just pass in your XyCluster as a Resource
and the framework then retrieves the annotation from there and returns true/false whether you should reconcile or not. This way the operator doesn't need to know or care about the name of the annotation or it's value or even if an annotation is involved or not. But control flow would still be in the operator, which I think is better in our case than moving control flow into the framework.
I've said the opposite in the past, but since we're still figuring stuff out I thing it's better like this.
Reviving the conversation:
preflight
call. I'd only change the key to "stackable.tech/pause-reconciliation".I am missing one final piece of information: what use-case do we want to pick, so we can judge if the implemented pause feature works as expected?
Once that is resolved, we can try pausing one operator of choice, move the functionality to operator-rs and roll it out by adding ~4 lines to all operators.
I think one of the main pain points we want to address with this feature is:
Instead it should be
That's just one scenario that came up to my mind. Feel free to replace hdfs with any product of choice
Picking up the annotation vs crd point again for a second, I'd personally prefer making these fields part of the CRD itself, as this would give us benefits like auto completion in IDEs, proper validation when applying to k8s, and enforce proper schema evolution on us.
Annotations would be less overhead, that is true, but they offer no assistance against stuff like:
Yes, putting this in the CRD itself would make us jump a little higher, but I personally consider this a good thing, because it will enforce proper versioning practice, instead of just yoloing it in a hashmap (overstating it a bit here :) ).
I see the benefits of adding a field to the CRD. It should make it harder to get wrong with all the checks we have in place.
I adjusted the epic description, and added a CRD proposal to the architecture meeting agenda.
Did you discuss this? Can we move this forward?
The CRD change proposal was discussed and adjusted. There was no strong consensus whether a CRD change or an annotation is the way to go, but I would propose to move forward with the CRD option and create a POC.
Official decision (by myself): We will go forward with the CRD approach and not consider annotations any further for now.
Some blocking issues and questions have come up, which need to be clarified before this epic can be worked on.
1) Blocker: Coupling to status epic. Does the reconciliationPaused
flag mean that we don't touch the resource at all, or do we still want to update the status fields? If the fields need to be reliable, that would make the status epic blocking for this one, and we would need to consider the requirement here. (cc @maltesander @razvan)
2) Concern: breaking changes in CRD. A breaking change in our CRD for an operator upgrade could lead to the paused CR not being parsed successfullly, flooding the operator logs with errors, and effectively circumventing the pause CRD field. Is this something we are okay with? If not, it looks like we could avoid parsing of the CR by checking for a reconciliationPaused
annotation. (brough up and well articulated by @siegfriedweber)
3) Question: do orphaned resources get cleaned up in paused state? This connects to the point above, but is something of an edge case. (brought up by @razvan)
Clarification of the above points are necessary to inform further work on this epic. (cc @lfrancke @soenkeliebau ). It seems like it can be finished until the next release in all cases.
As I will be on vacation next week, and the work on this epic needs to be finished by the next release, Razvan will take over the role of the epic owner so we can make sure that progress on this topic happens as necessary.
After a another discussion with @razvan and a good point by @siegfriedweber we came up with a mixed solution in terms of labels and CRD:
Detached (@siegfriedweber )
This will be done in the watch (main.rs) for the CRD via a fixed label from the operator-rs like "stackable.tech/operation: detached" and the ListParams:
let controller = Controller::new(
some_api,
ListParams {
label_selector: Some("stackable.tech/operation != detached".to_string()),
..ListParams::default()
},
)
.owns(...)
.owns(...);
This will result in not reconciling at all. This is implemented very easily and is non breaking.
ReconcilePaused
This will be done via the same label (but different value) as Detached "stackable.tech/operation: reconcilePaused".
Here we stop reconciling the "spec" of the CRD but still update the cluster status etc. This would be non breaking as well.
The name could be adapted then probably.
ClusterStopped
This is basically a setting that can be achieved already by just setting replicas to 0 in the custom resource (or use a bad nodeselector for daemonsets etc.)
So this should probably just be a top level setting in the CRD. This will be a breaking change in terms of an update of the CRD is required to use that flag (would be non breaking if that flag is not to be used).
EDIT: I did not consider apiVersion
while writing this, you can disregard the text below. It only applies if we have incompatible CRDs between releases which have the same apiVersion.
A point which just came up in a discussion: the user story which motivated this epic, implies the need for CRs which were written for a previous release of our platform, to be functional in the next release.
Otherwise, applying a new CRD version after pausing the operator would need to either
1) fail, or 2) make existing CRs invalid due to missing required fields for example.
(I'm currently not sure what happens when a CRD is applied which is not compatible with existing CRs.)
Versioned CRDs would resolve this. Otherwise we need to discuss ensuring a guarantee that CRs made for one release (23.1) will always remain functional with CRDs from the following release (23.4). This is the only way to provide a smooth upgrade path as described in the user story which Sebastian sketched out above.
Done 🚀
Some Kubernetes operators out there have a
pause
field in their spec which just skips reconciliation no matter what, others are using an annotation. Different methods (via annotations or labels) are viable. We can go with a CRD field as well.We want the same feature, and will be moving forward with adding a new CRD field for this functionality.
Examples:
TODO
Framework
Products
Internal