We're testing out the newly added support in pg_auto_failover for a Citus formation. In our Citus formation, we use pg_dist_poolinfo alongside pg_dist_node so that the internal connections between Citus nodes are made through a connection pooler (pgbouncer). It does not appear that pg_auto_failover supports pg_dist_poolinfo though. In our testing, when pg_auto_failover triggers a failover, it makes the appropriate change to pg_dist_node, but not to pg_dist_poolinfo.
Before failover:
ourdb=# select * from pg_dist_node order by nodeid;
nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
1 | 1 | citus-01-a.stg | 5432 | default | t | t | primary | default | t | t
2 | 2 | citus-02-a.stg | 5432 | default | t | t | primary | default | t | t
3 | 0 | citus-coord-a.stg | 5432 | default | t | t | primary | default | t | f
(3 rows)
ourdb=# select * from pg_dist_poolinfo order by nodeid;
nodeid | poolinfo
--------+-------------------------------------------------
1 | host=citus-01-a.stg port=6432 dbname=ourdb
2 | host=citus-02-a.stg port=6432 dbname=ourdb
3 | host=citus-coord-a.stg port=6432 dbname=ourdb
(3 rows)
After failover of citus-02-a to citus-02-b:
ourdb=# select * from pg_dist_node order by nodeid;
nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
1 | 1 | citus-01-a.stg | 5432 | default | t | t | primary | default | t | t
2 | 2 | citus-02-b.stg | 5432 | default | t | t | primary | default | t | t
3 | 0 | citus-coord-a.stg | 5432 | default | t | t | primary | default | t | f
(3 rows)
ourdb=# select * from pg_dist_poolinfo order by nodeid;
nodeid | poolinfo
--------+-------------------------------------------------
1 | host=citus-01-a.stg port=6432 dbname=ourdb
2 | host=citus-02-a.stg port=6432 dbname=ourdb
3 | host=citus-coord-a.stg port=6432 dbname=ourdb
(3 rows)
So, after failover, citus-02-b has taken the place of citus-02-a in pg_dist_node, but the corresponding row in pg_dist_poolinfo still points at citus-02-a. This renders the cluster inoperable.
We're testing out the newly added support in pg_auto_failover for a Citus formation. In our Citus formation, we use
pg_dist_poolinfo
alongsidepg_dist_node
so that the internal connections between Citus nodes are made through a connection pooler (pgbouncer). It does not appear that pg_auto_failover supportspg_dist_poolinfo
though. In our testing, when pg_auto_failover triggers a failover, it makes the appropriate change topg_dist_node
, but not topg_dist_poolinfo
.Before failover:
After failover of
citus-02-a
tocitus-02-b
:So, after failover,
citus-02-b
has taken the place ofcitus-02-a
inpg_dist_node
, but the corresponding row inpg_dist_poolinfo
still points atcitus-02-a
. This renders the cluster inoperable.