When a node is ongoing maintenance, the operator can use our maintenance
state to avoid our automation to continue driving the node. We're asked to
get pg_auto_failover out of the wheel.
In that situation, we want to avoid the node in maintenance to be connected
to the primary and acknowledge transaction commits: if then later we would
have to failover to a new node, we would open the hazard of selecting a node
that doesn't have all the reported commits. Because a commit could have made
it to a node in maintenance, and this node will not be taking part of the
failover process, neither as a candidate, not as a WAL source.
To ensure that commits won't get accepted by nodes in maintenance, we now
set synchronous_standby_names to 'pgautofailover_maintenance_blocks_writes',
a standby name that is otherwise never used. That ensures blocking all the
commits on the primary while every single one of the standby nodes is
ongoing maintenance.
To unblock the situation, it is possible to either pg_autoctl disable
maintenance on a standby node, or to pg_autoctl set formation
number-sync-standbys 0.
When a node is ongoing maintenance, the operator can use our maintenance state to avoid our automation to continue driving the node. We're asked to get pg_auto_failover out of the wheel.
In that situation, we want to avoid the node in maintenance to be connected to the primary and acknowledge transaction commits: if then later we would have to failover to a new node, we would open the hazard of selecting a node that doesn't have all the reported commits. Because a commit could have made it to a node in maintenance, and this node will not be taking part of the failover process, neither as a candidate, not as a WAL source.
To ensure that commits won't get accepted by nodes in maintenance, we now set synchronous_standby_names to 'pgautofailover_maintenance_blocks_writes', a standby name that is otherwise never used. That ensures blocking all the commits on the primary while every single one of the standby nodes is ongoing maintenance.
To unblock the situation, it is possible to either pg_autoctl disable maintenance on a standby node, or to pg_autoctl set formation number-sync-standbys 0.