hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.07k stars 113 forks source link

Ability to enable synchronous replica clear-selection mode in pg_auto_failover #919

Closed xinferum closed 1 year ago

xinferum commented 1 year ago

Hello.

Currently pg_auto_failover implements ANY-based synchronous replica response, i.e. at a particular moment, in a cluster with more than one replica (3+ datanodes), there is no clear understanding of which replica currently meets the requirements of synchronism (or which responded to the commit master when the transaction was completed).

I want to know if it is possible to implement in pg_auto_failver the inclusion of such a mode in which the pg_auto_failover cluster, depending on the specified number of synchronous number_sync_standbys replicas, selects and assigns a specific replica / replicas from which it expected a response when committing. In case of disconnection of a synchronous replica, he assigned a new one from the available ones.

Also in this option, I would like to see in pg_autoctl show state for such a synchronous replica / replicas selected by the cluster that it is guaranteed to be synchronous "Reported State" and "Assigned State". For example, instead of the "secondary" status, specify the "synchronius" status for such replicas.

That is, a mode similar to how Patroni works. I would like to clearly know in some cases which replica is currently synchronous, at least for various maintenance scenarios. The ANY option is good, but I would like to be able to more tightly bind the synchronous replica in the cluster.

Thank you.

DimCitus commented 1 year ago

Hi @xinferum ; thanks for your comments. Unfortunately, that's NOT at all how Postgres synchronous replication works. If you select a single standby node to acknowledge all the commits, then you would reduce your availability: what happens when this selected standby node is down? certainly, you can't commit anymore... or you have to wait until another replica gets up-to-date with the primary.

The way Postgres is implemented is that every replica gets a chance of acknowledging each COMMIT, which is better for HA as there is no intervention to schedule when any secondary node is down, with the same setup, operations just continue normally.

For maintenance scenarios with pg_auto_failover, please use the MAINTENANCE mode and the commands pg_autoctl enable maintenance and pg_autoctl disable maintenance.