Open joelanford opened 3 years ago
I'd like to suggest deprecating this package in favor of controller-runtime/pkg/leaderelection
, or at least make a note that it has this bug until it is fixed to deter users. client-go's leader-with-lease (and controller-runtime's wrapper) are quite stable and easy to use now (they were not back when this leader-for-life library was originally written), and even though it does not guarantee no overlap between elections it seems to be the de-facto standard upstream.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/lifecycle frozen
Anyone working on this? What I would love to see, is this leader-for-life feature available in controller-runtime! A pluggable leader election mechanism could be useful on it's own, but I think getting leader-for-life into controller-runtime would be more sustainable.
Feature Request
Is your feature request related to a problem? Please describe. Yes. It isn't possible to use leader-for-life leader election with controller-runtime's manager when also using liveness and readiness probes.
Using controller-runtime's manager out of the box, the following sequence of events happens when
manager.Start()
is called:When using leader-for-life from this repo, it must be called prior to
manager.Start()
since controller-runtime doesn't support pluggable leader election implementations. The sequence of events in this case is:Notice that 1) and 2) are swapped. This swap causes deadlocks when upgrading operator deployments that use leader-for-life. When the deployment is attempting to rollout a new version, the new pod starts up and first attempts to become the leader, failing indefinitely until the old pod relinquishes ownership. However the old pod will not relinquish ownership until it disappears and it won't disappear until the new pod reports that it's healthy. Unfortunately the new pod will never be able to report that it's healthy because it needs to be the leader before it starts its liveness and readiness probe servers.
Describe the solution you'd like To work upstream to make controller-runtime support a pluggable leader election implementation such that leader-for-life can be used by the manager.