hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.12k stars 115 forks source link

Edit maintenance nodes out of synchronous_standby_names. #798

Closed DimCitus closed 2 years ago

DimCitus commented 3 years ago

When a node is ongoing maintenance, the operator can use our maintenance state to avoid our automation to continue driving the node. We're asked to get pg_auto_failover out of the wheel.

In that situation, we want to avoid the node in maintenance to be connected to the primary and acknowledge transaction commits: if then later we would have to failover to a new node, we would open the hazard of selecting a node that doesn't have all the reported commits. Because a commit could have made it to a node in maintenance, and this node will not be taking part of the failover process, neither as a candidate, not as a WAL source.

To ensure that commits won't get accepted by nodes in maintenance, we now set synchronous_standby_names to 'pgautofailover_maintenance_blocks_writes', a standby name that is otherwise never used. That ensures blocking all the commits on the primary while every single one of the standby nodes is ongoing maintenance.

To unblock the situation, it is possible to either pg_autoctl disable maintenance on a standby node, or to pg_autoctl set formation number-sync-standbys 0.