to avoid a situation where the database is stopped, and then the playbook is stopped with an error during update packages (for example, when there are problems with dependencies), as a result of which the database remains stopped on one of the cluster servers.
Fixed:
PLAY [update_pgcluster.yml | Update PostgreSQL HA Cluster (based on "Patroni")] ***
TASK [Gathering Facts] *********************************************************
ok: [10.172.0.22]
ok: [10.172.0.21]
ok: [10.172.0.20]
TASK [Include main variables] **************************************************
ok: [10.172.0.20]
ok: [10.172.0.21]
ok: [10.172.0.22]
TASK [[Prepare] Get Patroni Cluster Leader Node] *******************************
ok: [10.172.0.21]
ok: [10.172.0.20]
ok: [10.172.0.22]
TASK [[Prepare] Add host to group "primary" (in-memory inventory)] *************
ok: [10.172.0.20] => (item=10.172.0.20)
TASK [[Prepare] Add hosts to group "secondary" (in-memory inventory)] **********
ok: [10.172.0.20] => (item=10.172.0.21)
ok: [10.172.0.20] => (item=10.172.0.22)
TASK [Print Patroni Cluster info] **********************************************
ok: [10.172.0.20] => {
"msg": [
"Cluster Name: postgres-cluster",
"Cluster Leader: pgnode01"
]
}
PLAY [(1/4) PRE-UPDATE: Perform Pre-Checks] ************************************
TASK [Include main variables] **************************************************
ok: [10.172.0.20]
ok: [10.172.0.21]
ok: [10.172.0.22]
TASK [Running Pre-Checks] ******************************************************
TASK [update : [Pre-Check] (ALL) Test PostgreSQL DB Access] ********************
ok: [10.172.0.20]
ok: [10.172.0.22]
ok: [10.172.0.21]
TASK [update : [Pre-Check] Make sure that physical replication is active] ******
ok: [10.172.0.20]
TASK [update : [Pre-Check] Make sure there is no high replication lag (more than 10.00 MB)] ***
ok: [10.172.0.20]
TASK [update : [Pre-Check] Make sure there are no long-running transactions (more than 15 seconds)] ***
ok: [10.172.0.21]
ok: [10.172.0.20]
ok: [10.172.0.22]
PLAY [(2/4) UPDATE: Secondary] *************************************************
TASK [Include main variables] **************************************************
ok: [10.172.0.21]
TASK [Include OS-specific variables] *******************************************
ok: [10.172.0.21]
TASK [Stop read-only traffic] **************************************************
TASK [update : Edit patroni.yml | enable noloadbalance, nosync, nofailover] ****
changed: [10.172.0.21] => (item=noloadbalance: true)
changed: [10.172.0.21] => (item=nosync: true)
changed: [10.172.0.21] => (item=nofailover: true)
TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.21]
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is unavailable (30 retries left).
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is unavailable (29 retries left).
TASK [update : Make sure replica endpoint is unavailable] **********************
ok: [10.172.0.21]
TASK [update : Wait for active transactions to complete] ***********************
ok: [10.172.0.21]
TASK [Stop Services] ***********************************************************
TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.21]
TASK [update : Execute CHECKPOINT before stopping PostgreSQL] ******************
changed: [10.172.0.21]
TASK [update : Stop Patroni service on the Cluster Replica (pgnode02)] *********
changed: [10.172.0.21]
TASK [Update PostgreSQL] *******************************************************
TASK [update : Update dnf cache] ***********************************************
changed: [10.172.0.21]
TASK [update : Install the latest version of PostgreSQL packages] **************
ok: [10.172.0.21] => (item=postgresql16)
ok: [10.172.0.21] => (item=postgresql16-server)
ok: [10.172.0.21] => (item=postgresql16-contrib)
TASK [Update Patroni] **********************************************************
TASK [update : Install the latest version of Patroni] **************************
ok: [10.172.0.21]
TASK [Update all system packages] **********************************************
TASK [update : Update dnf cache] ***********************************************
changed: [10.172.0.21]
fatal: [10.172.0.21]: FAILED! => {"attempts": 3, "changed": false, "failures": [], "msg": "Depsolve Error occurred: \n Problem: package iptables-legacy-1.8.8-6.el9.2.x86_64 from @System requires (iptables-libs(x86-64) = 1.8.8-6.el9 or iptables-libs(x86-64) = 1.8.8-6.el9_1), but none of the providers can be installed\n - cannot install both iptables-libs-1.8.10-2.el9.x86_64 from baseos and iptables-libs-1.8.8-6.el9.x86_64 from @System\n - cannot install both iptables-libs-1.8.8-6.el9.x86_64 from baseos and iptables-libs-1.8.10-2.el9.x86_64 from baseos\n - cannot install the best update candidate for package iptables-libs-1.8.8-6.el9.x86_64\n - cannot install the best update candidate for package iptables-legacy-1.8.8-6.el9.2.x86_64", "rc": 1, "results": []}
FAILED - RETRYING: [10.172.0.21]: Update all system packages (3 retries left).
FAILED - RETRYING: [10.172.0.21]: Update all system packages (2 retries left).
FAILED - RETRYING: [10.172.0.21]: Update all system packages (1 retries left).
TASK [update : Update all system packages] *************************************
NO MORE HOSTS LEFT *************************************************************
PLAY RECAP *********************************************************************
10.172.0.20 : ok=241 changed=88 unreachable=0 failed=0 skipped=706 rescued=0 ignored=0
10.172.0.21 : ok=208 changed=89 unreachable=0 failed=1 skipped=679 rescued=0 ignored=0
10.172.0.22 : ok=195 changed=83 unreachable=0 failed=0 skipped=665 rescued=0 ignored=0
Improve the error handling
in order to inform about update errors after completing the playbook.
Fixed: