zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.37k stars 980 forks source link

Retry unsuccessful failover on unschedulable nodes #2749

Open simonklb opened 2 months ago

simonklb commented 2 months ago

When draining a node where the leader is running before any replica has become ready the failover will not succeed. That is good. However, if the replica then becomes ready the failover is never retried and you have to uncordon and redo the drain for it to succeed.

I believe the relevant part is here: https://github.com/zalando/postgres-operator/blob/2e398120d2d0b3bb2b8bb239c6d49011ebe37e88/pkg/controller/node.go#L68-L72

Would you be open to change this behavior? Is the harm in letting the failover retry if it the node is still not ready?