Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.12.2
Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes
Are you running Postgres Operator in production? yes
Type of issue? Feature request
When draining a node where the leader is running before any replica has become ready the failover will not succeed. That is good. However, if the replica then becomes ready the failover is never retried and you have to uncordon and redo the drain for it to succeed.
When draining a node where the leader is running before any replica has become ready the failover will not succeed. That is good. However, if the replica then becomes ready the failover is never retried and you have to uncordon and redo the drain for it to succeed.
I believe the relevant part is here: https://github.com/zalando/postgres-operator/blob/2e398120d2d0b3bb2b8bb239c6d49011ebe37e88/pkg/controller/node.go#L68-L72
Would you be open to change this behavior? Is the harm in letting the failover retry if it the node is still not ready?