hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.07k stars 112 forks source link

Adjust formation->number_sync_standbys seems not as expected #932

Open yanboer opened 1 year ago

yanboer commented 1 year ago

Question

Hello, we testing some scenarios using pg_autofailover and Multi-node Architectures.

I wanted to use this after reducing the multi-standby cluster to a single-standby node, but I ended up with a single-node cluster with number_sync_standbys of 1.

Version

Some output from our scenario, for your perusal follows.

postgresql version:12.12

pg_autofailover version:1.6.4

pg_autoctl version 1.6.4
pg_autoctl extension version 1.6
compiled with PostgreSQL 12.12 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
compatible with Postgres 10, 11, 12, 13, and 14

Steps

First I have a 1 monitor , 2 postgresql cluster with the following settings:

  Context |    Name |                   Setting | Value                             
----------+---------+---------------------------+-----------------------------------
formation | primary |      number_sync_standbys | 0                                 
  primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2)'
     node |  node_1 |        candidate priority | 50                                
     node |  node_2 |        candidate priority | 50                                
     node |  node_1 |        replication quorum | true                              
     node |  node_2 |        replication quorum | true

When I add a postgresql node to the cluster, number_sync_standbys is adjusted to 1, and the settings are as follows:

  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | primary |      number_sync_standbys | 1                                                           
  primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 50                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true                                          

and then, I removed a node from the cluster, number_sync_standbys is still 1, and the settings are as follows:

  Context |    Name |                   Setting | Value                             
----------+---------+---------------------------+-----------------------------------
formation | primary |      number_sync_standbys | 1                                 
  primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2)'
     node |  node_1 |        candidate priority | 50                                
     node |  node_2 |        candidate priority | 50                                
     node |  node_1 |        replication quorum | true                              
     node |  node_2 |        replication quorum | true

I would like to know, if this is expected and when will number_sync_standbys be adjusted