Closed vitabaks closed 1 week ago
Result:
TASK [patroni : Stop patroni service on the Replica servers (if running)] ***************************************************************************************************************************************
changed: [172.31.33.111]
changed: [172.31.42.255]
TASK [patroni : Stop patroni service on the Master server (if running)] *****************************************************************************************************************************************
changed: [172.31.39.47]
TASK [patroni : Check that PostgreSQL is stopped] ***************************************************************************************************************************************************************
ok: [172.31.33.111]
ok: [172.31.39.47]
ok: [172.31.42.255]
TASK [patroni : Remove patroni cluster "vitabaks-pgcluster" from DCS (if exist)] ********************************************************************************************************************************
changed: [172.31.39.47]
TASK [patroni : Run "/usr/bin/pgbackrest --stanza=vitabaks-pgcluster --type=time "--target=2024-11-01 12:47:47+00" --delta restore" on Master] ******************************************************************
changed: [172.31.39.47]
TASK [patroni : Run "/usr/bin/pgbackrest --stanza=vitabaks-pgcluster --type=time "--target=2024-11-01 12:47:47+00" --delta restore" on Replica] *****************************************************************
changed: [172.31.33.111]
changed: [172.31.42.255]
FAILED - RETRYING: [172.31.42.255]: Waiting for restore from backup (2880 retries left).
FAILED - RETRYING: [172.31.39.47]: Waiting for restore from backup (2880 retries left).
FAILED - RETRYING: [172.31.33.111]: Waiting for restore from backup (2880 retries left).
TASK [patroni : Waiting for restore from backup] ****************************************************************************************************************************************************************
changed: [172.31.39.47] => (item=True)
changed: [172.31.42.255] => (item=True)
changed: [172.31.33.111] => (item=True)
TASK [patroni : Start PostgreSQL for Recovery] ******************************************************************************************************************************************************************
changed: [172.31.33.111]
changed: [172.31.39.47]
changed: [172.31.42.255]
FAILED - RETRYING: [172.31.39.47]: Wait for the PostgreSQL start command to complete (360 retries left).
FAILED - RETRYING: [172.31.33.111]: Wait for the PostgreSQL start command to complete (360 retries left).
FAILED - RETRYING: [172.31.42.255]: Wait for the PostgreSQL start command to complete (360 retries left).
TASK [patroni : Wait for the PostgreSQL start command to complete] **********************************************************************************************************************************************
changed: [172.31.39.47]
changed: [172.31.33.111]
changed: [172.31.42.255]
FAILED - RETRYING: [172.31.39.47]: Wait for PostgreSQL recovery to complete (WAL apply) (2880 retries left).
TASK [patroni : Wait for PostgreSQL recovery to complete (WAL apply)] *******************************************************************************************************************************************
ok: [172.31.39.47]
TASK [patroni : Get PostgreSQL recovery log] ********************************************************************************************************************************************************************
ok: [172.31.39.47]
TASK [patroni : Print PostgreSQL recovery log] ******************************************************************************************************************************************************************
ok: [172.31.39.47] => {
"msg": [
"2024-11-01 12:37:52.272 UTC [11910] LOG: recovery stopping before commit of transaction 248047, time 2024-10-21 11:57:58.000145+00",
"2024-11-01 12:37:52.272 UTC [11910] LOG: redo done at 0/6591C308 system usage: CPU: user: 8.84 s, system: 5.19 s, elapsed: 45.81 s",
"2024-11-01 12:37:52.272 UTC [11910] LOG: last completed transaction was at log time 2024-10-21 11:57:57.999688+00"
]
}
TASK [patroni : Check that PostgreSQL is stopped] ***************************************************************************************************************************************************************
ok: [172.31.33.111]
ok: [172.31.39.47]
ok: [172.31.42.255]
TASK [patroni : Stop PostgreSQL] ********************************************************************************************************************************************************************************
changed: [172.31.39.47]
TASK [patroni : Check the patroni.dynamic.json exists] **********************************************************************************************************************************************************
ok: [172.31.39.47]
ok: [172.31.33.111]
ok: [172.31.42.255]
TASK [patroni : Start patroni service on the Master server] *****************************************************************************************************************************************************
changed: [172.31.39.47]
TASK [patroni : Wait for port 8008 to become open on the host] **************************************************************************************************************************************************
ok: [172.31.39.47]
TASK [patroni : Check PostgreSQL is started and accepting connections] ******************************************************************************************************************************************
ok: [172.31.39.47]
TASK [patroni : Wait for the cluster to initialize (master is the leader with the lock)] ************************************************************************************************************************
ok: [172.31.39.47]
TASK [patroni : Make sure the Master is not in recovery mode] ***************************************************************************************************************************************************
ok: [172.31.39.47]
TASK [patroni : Make sure the superuser and replication users are present, and password does not differ from the specified] *************************************************************************************
changed: [172.31.39.47] => (item=postgres)
changed: [172.31.39.47] => (item=replicator)
TASK [patroni : Make sure the postgresql users are present, and password does not differ from the specified] ****************************************************************************************************
ok: [172.31.39.47] => (item=pgbouncer)
TASK [patroni : Update postgresql authentication in patroni.yml] ************************************************************************************************************************************************
changed: [172.31.39.47]
changed: [172.31.33.111]
changed: [172.31.42.255]
TASK [patroni : Start patroni service on Replica servers] *******************************************************************************************************************************************************
changed: [172.31.33.111]
changed: [172.31.42.255]
TASK [patroni : Wait for port 8008 to become open on the host] **************************************************************************************************************************************************
ok: [172.31.42.255]
ok: [172.31.33.111]
TASK [patroni : Check that the patroni is healthy on the replica server] ****************************************************************************************************************************************
ok: [172.31.42.255]
ok: [172.31.33.111]
TASK [deploy-finish : Get Postgres users] ***********************************************************************************************************************************************************************
ok: [172.31.39.47]
TASK [deploy-finish : Get Postgres databases] *******************************************************************************************************************************************************************
ok: [172.31.39.47]
TASK [deploy-finish : Get Postgres cluster info] ****************************************************************************************************************************************************************
ok: [172.31.39.47]
TASK [deploy-finish : Postgres list of users] *******************************************************************************************************************************************************************
ok: [172.31.39.47] => {
"msg": [
" List of roles",
" Role name | Attributes ",
"------------+------------------------------------------------------------",
" pgbouncer | ",
" postgres | Superuser, Create role, Create DB, Replication, Bypass RLS",
" replicator | Replication"
]
}
TASK [deploy-finish : Postgres list of databases] ***************************************************************************************************************************************************************
ok: [172.31.39.47] => {
"msg": [
" List of databases",
" Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges ",
"-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------",
" postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | ",
" template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres +",
" | | | | | | | | postgres=CTc/postgres",
" template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres +",
" | | | | | | | | postgres=CTc/postgres",
"(3 rows)"
]
}
TASK [deploy-finish : Postgres Cluster info] ********************************************************************************************************************************************************************
ok: [172.31.39.47] => {
"msg": [
"+ Cluster: vitabaks-pgcluster (7428194193594549507) ---------------+----+-----------+",
"| Member | Host | Role | State | TL | Lag in MB |",
"+------------------+---------------+---------+---------------------+----+-----------+",
"| ip-172-31-33-111 | 172.31.33.111 | Replica | in archive recovery | 3 | 0 |",
"| ip-172-31-39-47 | 172.31.39.47 | Leader | running | 5 | |",
"| ip-172-31-42-255 | 172.31.42.255 | Replica | in archive recovery | 3 | 0 |",
"+------------------+---------------+---------+---------------------+----+-----------+"
]
}
PLAY RECAP ******************************************************************************************************************************************************************************************************
172.31.33.111 : ok=38 changed=7 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0
172.31.39.47 : ok=58 changed=10 unreachable=0 failed=0 skipped=72 rescued=0 ignored=0
172.31.42.255 : ok=38 changed=7 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0
localhost : ok=19 changed=0 unreachable=0 failed=0 skipped=206 rescued=0 ignored=0
This PR introduces a configurable
cluster_restore_timeout
variable to replace hardcoded values and provide flexibility. The timeout has been increased to 24 hours, making it more suitable for restoring large databases, including both backup and WAL recovery.Additionally, we've added the task "Wait for the Standby cluster initialization to complete". This task allows tracking the progress of Standby cluster initialization in the Ansible logs. The same 24-hour timeout is applied to this task to ensure it has ample time to complete for large databases.
Variables:
cluster_restore_timeout
- backup and WAL restore timeout in seconds, default: 86400 (24 hours)pg_ctl_timeout
- pg_ctl start and stop timeout, default: 3600 (1 hour)Additionally
pgbackrest_patroni_cluster_restore_command
variable.--config-file
option.restore_command
option yo be able to restore WAL from the archive even if the postgresql_restore_command variable is not set.archive_command
option to eliminate the risk of archiving WAL immediately after the promotion. It makes sense for cloning when we don't want to send the WAL to the main cluster storage.