Open neeraj-laad opened 1 year ago
I'd like to add a +1 for this as well. Would it make sense to have something that can be configured in the Subscription along with these other settings? https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md
We have a need for this also for the EDB Operator deployment, where it manages the failover of primary postgres to postgres replica pods. EDB operator already uses leader-election.
If EDB operator and postgres primary pod are in the same worker node and if kill that worker node from VM level (mimicking a disaster), EDB operator takes 5 min /7 min to come and perform postgres failover https://medium.com/tailwinds-navigator/kubernetes-tip-how-to-make-kubernetes-react-faster-when-nodes-fail-1e248e184890 actually explains why there is 5 min time
The replicas
field can be set in the CSV pre-install (as when you create the CSV). Not sure if I misunderstood something here.
While we have not documented this yet, this repository is effectively in maintenance mode and we are trying to avoid adding new features. Instead, our new feature efforts are focused on the "OLM v1" effort in rukpak, catalogd, and operator-controller.
We are hoping to add support for some type of templating to rukpak. @joelanford and I are working on high level plans, and we will share them as soon as they're ready.
The potential rukpak changes are at a layer below the operator API, so we'll also need to add a way to the operator API to allow users to specify values to pass through to rukpak. As soon as we have more details on this, we'll share those too.
Per the OLM Dev call today (Aug 15, 2023), we discussed how this "could" be fixed by adding a Subscription config per https://github.com/operator-framework/operator-lifecycle-manager/issues/2923#issuecomment-1480191733.
If a PR were created to address, there would need to be an RFE created and an implementation PR to back it up. If those were created, that includes the complete use cases, the OLM team feels that it "may" be acceptible to accept it.
I think the main use case is similar to setting the requests & limits to the Subscription config. Depending on the customer's environment, it may not always be suitable to have 1, 2 or 3 replicas. The customer must understand the contract and capabilities with the Operator to know hwo to safely "override" the operator controller defaults using the Subscription config.
To make use of the Subscriptoin config, the Operator could not be managed as a dependency since the Subscription is auto-created for dependencies. Such dependencies would need to be pre-created.
There are several workarounds:
Feature Request
Is your feature request related to a problem? Please describe. I would like a user to be able to configure how many replicas of my operator are deployed. This can help provide HA for operator deployments.
This is critical for use cases where the operator is responsible for managing the deployments of a critical system. When the operator is down, the system loses the ability to cope with/manage further failures or updates effectively. It would be very useful for operators to use leader election (https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/) and if one operator id down another operator can become the leader and manage the deployments.
Describe the solution you'd like The only way to do this today is via updating the CSV
/spec/install/spec/deployments/0/spec/replicas
post-install, which is not ideal. It would be much nicer if there was a mechanism as operator install time to set this value so a user can decide upfront how many replicas are suitable for their use case. Also, it would be nice to provide a way to update this post-installation, to allow scale-up/down like a regular deployment.