Closed aermakov-zalando closed 2 months ago
The operator normally deletes k8s resources it creates . Can you please provide more details on the problems with deletion you encountered ?
The linked function doesn't delete services, for one. Additionally, the operator can always be killed in the middle of Delete()
, leaving dangling resources. It should also be easier to maintain since you won't have to write the cleanup code or test it, Kubernetes will always clean up the resources for you no matter what.
The linked function doesn't delete services, for one.
service are deleted, but the point overall is valid
Isn't there also a -config
service that should be deleted as well?
Yeah, those come from the headless services via Patroni.
Guess we need some thinking here, who cleans up what and do we have any expectation on what deletes what and how one thing prevents mistakes. Let's think about this cascade or how to make it safe and clean.
The reason why I created the linked #886 was mostly because of external tooling that we have for cluster monitoring and maintenance. I know that the operator will clean up after itself, but having the reference allows us to better manage the state of our clusters.
... because of external tooling that we have for cluster monitoring and maintenance. ... having the reference allows us to better manage the state of our clusters.
This is true for us as well
Apart from internal management, this would be very important for those using projects like ArgoCD for GitOps, since that's the way they know which CRD caused resources to be deployed... otherwise in the UI it looks pretty ugly to track "orphaned" resources (that ArgoCD thinks were deployed manually, but are actually managed in the Git repo with the custom CRD...).
While I support this idea of creating some sort of relationship / ownership among the resources the operator creates and manages. @sdudoladov Rgarding the dangling resources @aermakov-zalando referred to which the operator might leave behind if being interrupted (or actually not running, also a source for inconsistencies here) the only solution is a Finalizer blocking the removal / garbage collection of the postgresl custom resource until the operator is done with it and give its green light to remove it - see https://github.com/zalando/postgres-operator/issues/450
So far we've been reluctant to ownerReferences and finalizes as we fear it makes deleting DBs by accident too easy. Of course, I see that you want things to be cleaned up, in case something goes wrong during (intended) deletions or creations of clusters.
A good compromise might be to make the assignment of ownerReferences configurable. Maybe someone wants to come up with a PR for this. We are already using onwerReferences between the statefulSet and the new connectionPooler deployment. So shouldn't be too hard to implement.
Somehow KubeDB manages the deletion concern with a few states: https://kubedb.com/docs/v0.13.0-rc.0/guides/postgres/quickstart/quickstart/#wipeout-dormantdatabase,is it really a concern if someone sets the state consciously?
So far we've been reluctant to ownerReferences and finalizes as we fear it makes deleting DBs by accident too easy. Of course, I see that you want things to be cleaned up, in case something goes wrong during (intended) deletions or creations of clusters.
A good compromise might be to make the assignment of ownerReferences configurable. Maybe someone wants to come up with a PR for this. We are already using onwerReferences between the statefulSet and the new connectionPooler deployment. So shouldn't be too hard to implement.
@FxKu How are references making it "easier" to accidentally delete a database cluster? Deleting the "postgresql" resource is all it takes anyways.
I may refer to yet another aspect of using Finalizers in issue https://github.com/zalando/postgres-operator/issues/387 which @sdudoladov brought up.
Certainly you can't have it both ways, but there should be a clear strategy about whether the Operator is the master of its resources. Meaning that only changes that go via the CR (be it reconfigurations or deletions of a cluster) are actually supported.
Certainly a human with sufficient access to the Kubernetes API has to "understand" that an Operator and therefore a CR is in the loop when dealing with such resources. Trying to reconfigure or delete those manually is always a bad idea. There is just not an easy way to protect those resources from being fiddled with. But what you can and should assure is that the Operator always keeps the managed resources in line with what the CR says. Otherwise the operator would just be a template engine doing the initial conversion from CR to the set of resources required to have the PostgreSQL cluster but then let's everybody and their dog mess things up manually. But an operator usually manages the whole life-cycle of such an abstract resource as "postgresql" and the resulting "native" building blocks of Kubernetes. To me the best analogy is configuration management. Certainly you can ssh into a box and change configuration files manually and stop the puppet agent - but then you are going to have a bad time. The job of an Operator as well as configuration management in general should be to keep variance and complexity in the various states things can be in low.
That being said, it really should not make much of a difference to have the cascading delete the discussed owner references make the Kubernetes API server do or to have the Operator do all those things by itself. It might be nicer to log and "run final steps" as with Finalizers doing "a last base backup" like AWS RDS does when the operator does things explicitly. The end result - no CR = no resources - should actually be the same.
At least if someone does not want to have this relationship this could be optional and enabled through the helm chart.
Chiming in here as well, this is a hard-blocker for our team to adopt this operator since we use ArgoCD
@FxKu I'd love to revive the conversation and potentially provide a patch for this.
Although I disagree with making owner references optional, I came up with two solutions supporting that:
Any ideas/comments?
The StatefulSet should not be the owner. Sometimes a statefulset has to be recreated due to immutable fields. A delete of the statefulset should therefore not delete other resources.
A delete of the statefulset should therefore not delete other resources.
Well, I guess it already does according to @FxKu 's earlier comment.
But I agree with you, the other options seems way better (except making owner references optional).
This was the winning operator for us until we created a POC with argocd and found out about this issue. This is definitely a deal breaker for people doing gitops with argocd or flux
This is definitely a deal breaker for people doing gitops with argocd or flux
Can definately confirm for us.
Must have also for us.
Looking at how rabbitmq operator handle this, the postgresql resource should be the owner of the other resources
I'd be willing to work on this given there is a consensus about the solution. We should have a conversation about the potential solutions.
Let`s have a conversation about this, we will try to catchup on this topic quickly.
Would be good to understand the complexity of a feature toggle around the owner reference. Potentially it is the very right thing though to have it non optional, but this needs some elaboration on what happens during sync&migration.
https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/
Maybe more informed users can confirm or correct that this would be a very good first step to also redo future work on cleanup and or delete protection. So far it seems that owner reference should be implemented, but does not immediately mean cascading delete. Depending on the answer here, we could skip the "optional".
Earlier objections and adding ownership to statefulset are ok, so proceeding with ownership pointing to the CRD seem good to me.
Edit: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#owners-dependents
Reading more shines some light on apparent default cascading delete
To be completely honest: I never really understood the fear from "accidental deletion". I mean, what if I "accidentally" delete the StatefulSet? How is that any different?
Perhaps we should address that question first, so that we can come up with an optional solution.
On potential argument I could make is that accidentally deleting the CRD would also lead to deleting all CRs and databases. If that's the concern, the operator could apply a finalizer to the CRD as long as there are CRs in the cluster. That would prevent the CRD from being deleted as long as it has any CRs.
Other than the above, I simply don't see how "accidentally" deleting the CR is different from deleting the StatefulSet
.
Now, adding an owner reference will result in a cascading delete by default, unless an orphan deletion policy is applied to the delete request. However, that has to be explicit, so it doesn't protect you from accidental deletion either.
Applying finalizers to CRs could also prevent deletions, but these would have to be applied "manually" and directly to the CRs, thus would have nothing to do with the operator itself.
Again, I'm trying to understand the accidental deletion argument and as others before me, I don't see the difference between the CR and the StatefulSet from this perspective and as a result I can't see why owner reference shouldn't be the default behavior.
I've seen in occasions that a Custom Resource allowed configuring whether child objects should be orphaned: https://external-secrets.io/v0.5.7/guides-ownership-deletion-policy/#orphan
The problem with a controller level configuration value is exactly what makes this issue hard: how do you handle config changes.
When the CR explicitly changes the behavior, you can react to the change. When you have a controller that starts with a new config value, you have no idea what the initial state way. We can guess based on whether ownerReferences are present or not, but migrating between the two becomes significantly harder (especially since we are talking about owner references, it wouldn't be this difficult IMO if we were talking about any other field).
Admittedly, I haven't looked into how the current lifecycle is managed, so it may not be that complicated to move between two states. We might also just add the creationPolicy
flag mentioned above with a default value first and add support for owner references in a next step. We have to figure out step-by-step plan here.
But the first step I believe is deciding the default behavior and how we want to control differences from that. If, for some reason adding owner references by default (+ finalizers for whoever wants to prevent deleting CRs) is not enough, I would suggest adding a creationPolicy flag to the spec.
You are right, anyone can delete things we do not want them to deleted.
We do have two steps here originally, the operator ignoring CRD deletes and better we do now have admission controller level delete protection. So we could look at this being solved. The difference is mostly "risk" related, people will come into the habbit of cleaning up and are more likely to delete CRDs then statefulsets. We also see that it takes time until migrations or deletes happen, but we can consider this a bit void now. As much as it makes the implemented ignore CRD delete also void, once we have the proper owner references.
Seems many here favor proper ownership vs accidental deletes and that can be the right approach.
You raise a concern around going forth an back between owner reference on and off, something we might also downvote. The migration should be forward, towards ownership references. And ownership references can also become the default.
Not sure I fully understand your last comment. From what I can tell, you propose adding owner references and it should be enabled by default.
Do you still want to be able to "turn it off"? If so, which approach: controller option or CRD flag?
We can limit migration between the two in some cases (eg. make the CRD flag immutable), but not handling those situations could potentially mean if you change the config, you may accidentally destroy your Postgres cluster. (Yeah, I see the irony :) )
In my opinion if some delete protection should be available it should be on the data, i.e. the PersistentVolume(Claim)
.
This is possible for example for Elasticsearch managed by the ECK operator.
And, in my opinion, upside to this sort of "delete protection" is that on CR (postgresql
instance) deletion, the undesired state will become apparent immediately (because the database becomes unavailable). That means such a mistake cannot remain unnoticed for a prolonged time (that unfortunately happened for some of our users). Yet a restore only requires a recreation of the postgresql
object (for which the PVC template will be fulfilled by the still existing old PVCs).
Yes, I propose to work on owner reference, given the strong need and a concept employed by others. This will hopefully then also reduce our own cleanup code at some future point.
I do think we need an off state and a migration path to "on". The new default can be "on". I do not think we need to invest into a path from "on" to "off".
I do not think we need to invest into a path from "on" to "off".
That's only possible if on->off is not even possible (ie. immutable field in the spec). If there is a way to move from one to the other (user can change config), but we don't handle it the result of that is unknown, most probably a disaster.
So I'd suggest that we either find a way to make sure a user can't change that config (the only way I can imagine right now is a field in the CR, not a controller level config) or add some sort of support for moving between the two states.
If for nothing else, to make sure the issue tracker is not flooded with bug reports after the feature is released.
Some high-level steps to implement this feature from top of the mind:
From Argo CD UI you can open a terminal and exec into pods. Some developers need terminal access to run psql commands in the Zalando created Pods. Because ownerReference
is missing we can't see the postgres pod and thus can't open a terminal from Argo CD.
Because of this we need to create other form of access to the k8s cluster instead of simply relying on Argo CD UI.
This is a big issue because we need to change our access processes just for a single use case - accessing the postgres pod.
This pull request aligns this amazing operator with the standards of Kubernetes, making it a very important improvement, especially with the rise of gitops. How can we make progress on this matter ? That's big a barrier to adoption.
I can agree that this matter is a big a barrier to adoption
+1
Many other operators use ownerReference very successfully like Vitess (CNCF Graduated project), ECK operator, etc. Can't wait to see this adopted.
It's not pretty, but this workaround can be used while this PR is not yet merged: https://github.com/zalando/postgres-operator/issues/1766#issuecomment-2182855392
If I delete a database, resources created by it (services, statefulsets, etc.) should be deleted automatically. This can be easily implemented by adding the Postgres resource as an
ownerReference
(withcontroller: true
). Kubernetes will then automatically delete the resources once the master database resource is gone.