step1. Generate loads using pgbench on the primary and secondary.
step2. The pg_autoctl perform failover command continues to be executed periodically.
Failover was performed repeatedly and then stopped.
The log is:
#### check current state of the formation
$ pg_autoctl show state --formation test
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------------+-------+--------------------+------------------+--------------+---------------------+--------------------
dev-pgf200003 | 3 | dev-pgf200003:5432 | 514: E2/710000D8 | read-write | wait_primary | wait_primary
dev-pgf200002 | 21 | dev-pgf200002:5432 | 1: 0/0 | none ! | wait_standby | catchingup
#### after drop node, execute "pg_autoctl create postgres" command in secondary
$ pg_autoctl create postgres \
--pgctl $CmdPath \
--pgdata $PGDATA \
--pghost `hostname` \
--name `hostname` \
--pgport 5432 \
--hostname `hostname` \
--formation test --skip-pg-hba --no-ssl --maximum-backup-rate 1024M --monitor postgres://autoctl_node@dev-pgf200001:5432/pg_auto_failover
10:32:59 130213 WARN PG_REGRESS_SOCK_DIR is set to "$path", and our setup is using "dev-pgf200002"
10:32:59 130213 INFO Continuing from a previous `pg_autoctl create` failed attempt
10:32:59 130213 INFO PostgreSQL state at registration time was: PGDATA does not exist
10:32:59 130213 INFO FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
10:32:59 130213 INFO Initialising PostgreSQL as a hot standby
10:32:59 130213 WARN PG_REGRESS_SOCK_DIR is set to "$path", and our setup is using "dev-pgf200003"
-10:32:59 130213 ERROR history file "00000202.history" contains 1024 lines, pg_autoctl only supports up to 1023 lines
10:32:59 130213 ERROR Failed to connect to the primary with a replication connection string. See above for details
10:32:59 130213 ERROR Failed to initialize standby server, see above for details
10:32:59 130213 ERROR Failed to transition from state "wait_standby" to state "catchingup", see above.
10:33:00 130203 ERROR pg_autoctl service node-init exited with exit status 12
10:33:00 130203 FATAL pg_autoctl service node-init has already been restarted 5 times in the last 1 seconds, stopping now
10:33:00 130205 INFO Postgres controller service received signal SIGTERM, terminating
10:33:00 130203 FATAL Something went wrong in sub-process supervision, stopping now. See above for details.
10:33:00 130203 INFO Stop pg_autoctl
#### check current state of the formation
$ pg_autoctl show state --formation test
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------------+-------+--------------------+------------------+--------------+---------------------+--------------------
dev-pgf200003 | 3 | dev-pgf200003:5432 | 514: E2/710000D8 | read-write | wait_primary | wait_primary
dev-pgf200002 | 21 | dev-pgf200002:5432 | 1: 0/0 | none ! | wait_standby | catchingup
#### check timeline history file on primary
$ cat 00000202.history | tail -10
509 D0/3714D748 no recovery target specified
510 D0/B09A8F40 no recovery target specified
511 D0/FC976330 no recovery target specified
512 D1/A50307D8 no recovery target specified
513 D1/F89E3FB8 no recovery target specified
$ cat 00000202.history | wc -l
1025
#### remove empty string
$ sed -i '/^$/d' 00000202.history
#### retry "pg_autoctl create postgres " command in secondary
$ pg_autoctl drop node
$ pg_autoctl create postgres \
--pgctl $CmdPath \
--pgdata $PGDATA \
--pghost `hostname` \
--name `hostname` \
--pgport 5432 \
--hostname `hostname` \
--formation test --skip-pg-hba --no-ssl --maximum-backup-rate 1024M --monitor postgres://autoctl_node@dev-pgf200001:5432/pg_auto_failover
nohup pg_autoctl run >> /home1/postgres/db/pglog/pg_autoctl.log 2>&1 &
$ pg_autoctl show state --formation test
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------------+-------+--------------------+------------------+--------------+---------------------+--------------------
dev-pgf200003 | 3 | dev-pgf200003:5432 | 514: E2/73000110 | read-write | primary | primary
dev-pgf200002 | 21 | dev-pgf200002:5432 | 514: E2/73000110 | read-only | secondary | secondary
Checking the source code, the maximum lines of the .history file is set to 1024.
I would like to know why you set PG_AUTOCTL_MAX_TIMELINES to 1024.
Information recorded in the timelineID.history file is not deleted.
As a result of the test, failover is performed up to 513 times.
If there is no reason to set PG_AUTOCTL_MAX_TIMELINES to 1024, could you modify the PG_AUTOCTL_MAX_TIMELINES value to a very large value (e.g 1048576(2^20))?
I have repeatedly performed failover tests.
test version pgf 2.0 postgresql 13.10
step1. Generate loads using pgbench on the primary and secondary. step2. The pg_autoctl perform failover command continues to be executed periodically.
Failover was performed repeatedly and then stopped. The log is:
Checking the source code, the maximum lines of the .history file is set to 1024.
define PG_AUTOCTL_MAX_TIMELINES 1024
https://github.com/hapostgres/pg_auto_failover/blob/10c62c247b34ca6515f3bbf17008a4a31a2eb16b/src/bin/pg_autoctl/pgsql.h#L196-L210
I would like to know why you set PG_AUTOCTL_MAX_TIMELINES to 1024.
Information recorded in the timelineID.history file is not deleted. As a result of the test, failover is performed up to 513 times.
If there is no reason to set PG_AUTOCTL_MAX_TIMELINES to 1024, could you modify the PG_AUTOCTL_MAX_TIMELINES value to a very large value (e.g 1048576(2^20))?