Open vladimirfx opened 3 years ago
This is the expected behavior. Changes are not dynamically loaded. Ideally you have a deployment pipeline with operator and configuration and run that whenever there's a change, so that the operator pod is replaced.
Thank you for response! Actually we have GitOps pipeline witch is declarative and change to resource is enough to apply it. That works for 20+ operators/controllers but Postgres Operator. Operator (more generally controller https://kubernetes.io/docs/concepts/architecture/controller/) by its defenition a sync state mechanism:
A controller tracks at least one Kubernetes resource type. These objects have a spec field that represents the desired state. The controller(s) for that resource are responsible for making the current state come closer to that desired state.
So if postgresql-operator is actually Controller for OperatorConfiguration resource it should sync controlled resources state. At least do that in configurable way. Or it can be implemented with (now deprecated) ConfigMap configuration mechanism.
Please consider implement control loop for operator controlled by env variable of operator pod (disable by default). I would provide a PR if desired.
This is the expected behavior. Changes are not dynamically loaded. Ideally you have a deployment pipeline with operator and configuration and run that whenever there's a change, so that the operator pod is replaced.
Within the Helm chart, https://github.com/zalando/postgres-operator/blob/master/charts/postgres-operator/templates/deployment.yaml#L23 appears to attempt to achieve that. But Helm does not order the resource updates to ensure this actually works: https://github.com/helm/helm/blob/main/pkg/releaseutil/kind_sorter.go#L28-L68. Custom resources themselves are updated last.
We've observed, the operator roll due to a combination of changing config and an operator image change, but when it starts up the OperatorConfiguration
is still being updated (this is likely partially attributed to our chart being a larger bundle of charts for bootstrapping so lots of resources / custom resources). Because the OC is outdated, the operator does not see the config changes, and we have had very confusing, "non-deterministic" windows for when our Postgres clusters might upgrade (if the operator restarts due to a Node
going down, whatever you're all the sudden updating the Postgres cluster). Paired with a serious bug in newer images like https://github.com/zalando/postgres-operator/issues/2747, this has nearly been catastrophic for us and we were lucky to notice it all at once.
So I would assert:
I think a fix could be an env var to enable such new behavior like @vladimirfx suggested, but then rather than going to the trouble of a full control loop, the operator simply graceful exits and K8s restarts it via the regular container lifecycle. On startup, the operator will load the new configuration through its regular startup procedure and not have to account for dynamic config changes during in-flight reconciles at this time.
Would that be an acceptable solution while the maintainers consider how a control loop could/should be implemented ?
Unfortunately, I can't provide a PR even if the idea will be accepted. We migrated from Zalando mainly because of this issue (it can't be used in GitOps without ugly workarounds) and Spilo lock-in (it is a tough task to move from Spilo-based deployments). Overall, Zalando is a robust and full-featured solution - thanks to its maintainers!
Happy to make the contribution, will likely pose the same question in the form of a PR.
Operator successfully read own OperatorConfiguration resource on start and ignores subsequent updates to it. I can't find any params/env to configure this behavior in docs.
As workaround we are forced to delete operator pod.