vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on Patroni). Automating with Ansible.
https://postgresql-cluster.org
MIT License
1.69k stars 411 forks source link

deprecated tag point_in_time_recovery? #402

Closed chuegel closed 1 year ago

chuegel commented 1 year ago

Hi,

I've noticed that the tag point_in_time_recoveryis not referenced in any role. Is this recovery type deprecated? Also it is not listed in the tags.mdfile.

Thanks

vitabaks commented 1 year ago

not deprecated

use it to restore the cluster from a backup. \ Details: https://github.com/vitabaks/postgresql_cluster#restore-and-cloning

vitabaks commented 1 year ago

Also it is not listed in the tags.mdfile.

Added https://github.com/vitabaks/postgresql_cluster/commit/08b684010b0ae016de0daacf35393b1e375c1b8f

chuegel commented 1 year ago

Hmmm...but how is this suppose to work?

ansible-playbook deploy_pgcluster.yml --tags "point_in_time_recovery" --ask-vault-pass

.
.
.
.
TASK [patroni : Start patroni service on the Master server] **********************************************************************************************************************************************************************************
changed: [192.168.100.101]

TASK [patroni : Wait for port 8008 to become open on the host] *******************************************************************************************************************************************************************************
ok: [192.168.100.101]

TASK [patroni : Check PostgreSQL is started and accepting connections on Master] *************************************************************************************************************************************************************
ok: [192.168.100.101]
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (10 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (9 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (8 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (7 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (6 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (5 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (4 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (3 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (2 retries left).
FAILED - RETRYING: [192.168.100.101]: Wait for the cluster to initialize (master is the leader with the lock) (1 retries left).

TASK [patroni : Wait for the cluster to initialize (master is the leader with the lock)] *****************************************************************************************************************************************************
fatal: [192.168.100.101]: FAILED! => {"attempts": 10, "changed": false, "content_type": "application/json", "date": "Sat, 08 Jul 2023 15:57:13 GMT", "elapsed": 0, "json": {"database_system_identifier": "7252770419591467073", "dcs_last_seen": 1688831826, "patroni": {"scope": "postgres-cluster", "version": "3.0.3"}, "postmaster_start_time": "2023-07-08 15:56:38.014399+00:00", "role": "replica", "server_version": 150003, "state": "running", "timeline": 4, "xlog": {"paused": false, "received_location": 503350816, "replayed_location": 503350816, "replayed_timestamp": null}}, "msg": "Status code was 503 and not [200]: HTTP Error 503: Service Unavailable", "redirected": false, "server": "BaseHTTP/0.6 Python/3.10.6", "status": 503, "url": "http://192.168.100.101:8008/leader"}

NO MORE HOSTS LEFT ***************************************************************************************************************************************************************************************************************************

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
192.168.100.101            : ok=20   changed=1    unreachable=0    failed=1    skipped=42   rescued=0    ignored=0
192.168.100.102            : ok=13   changed=0    unreachable=0    failed=0    skipped=44   rescued=0    ignored=0
192.168.100.103            : ok=13   changed=0    unreachable=0    failed=0    skipped=44   rescued=0    ignored=0
192.168.100.104            : ok=8    changed=0    unreachable=0    failed=0    skipped=11   rescued=0    ignored=0

Shouldn't the the first steps be to stop the master/replicas?

1. Stop patroni service on the Replica servers (if running);
2. Stop patroni service on the Master server;
3. ....

Instead the the role try to start the master (which is already running) thus promoting the master to a replica leading to the error above

vitabaks commented 1 year ago

Have you read the documentation?

What is the value of the "patroni_cluster_bootstrap_method" variable and other pgbackrest or wal-g variables?

chuegel commented 1 year ago

You're right. I overlooked that value in the config.