Closed Hannsre closed 1 week ago
Hi @Hannsre,
This error indicates that the replica is trying to join a cluster, but its system ID doesn't match the primary, meaning the replica belongs to another cluster.
Thanks for the quickt reply!
Not sure how to check whether all the necessary data is there tbh.
After removing the data directory and starting patroni I get at least a different error, so this seems connected to pgbackrest?
That's at least the only occurence of repo1-path
I could find in the playbook.
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,812 INFO: Selected new etcd server http://192.168.88.103:2379
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,817 INFO: No PostgreSQL configuration items changed, nothing to reload.
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,863 INFO: Lock owner: pgcn01; I am pgcn02
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,911 INFO: trying to bootstrap from leader 'pgcn01'
Sep 24 10:40:18 pgcn02 patroni[49835]: P00 ERROR: [031]: option 'repo1-path' cannot be set multiple times
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,921 ERROR: Error creating replica using method pgbackrest: /usr/bin/pgbackrest --stanza=postgresql-cluster --delta restore exited with code=31
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,921 ERROR: failed to bootstrap from leader 'pgcn01'
Sep 24 10:40:18 pgcn02 patroni[49819]: 2024-09-24 10:40:18,921 INFO: Removing data directory: /var/lib/postgresql/16/main
That's also one of the major changes I did, setting pgbackrest_install: true
instead of false
, als I added a S3 Bucket to the config according to the docs by adding this block (with our data ofc):
- { option: "repo1-s3-key", value: "YOUR_AWS_S3_ACCESS_KEY" } # change this value
- { option: "repo1-s3-key-secret", value: "YOUR_AWS_S3_SECRET_KEY" } # change this value
- { option: "repo1-s3-bucket", value: "YOUR_BUCKET_NAME" } # change this value
- { option: "repo1-s3-endpoint", value: "s3.us-east-1.amazonaws.com" } # change this value
- { option: "repo1-s3-region", value: "us-east-1" } # change this value
Could also use MinioS3 or SSH, as I have both available as well, but I'm not sure if that would make a difference. I couldn't locate the pgbackrest_auto_conf
variable though, so not sure what it's set to currently.
Not sure how to check whether all the necessary data is there tbh.
Hopefully, we are dealing with a test database cluster. The fact that you somehow have two different 'system IDs' within the same cluster indicates that there may be discrepancies between the data on replica servers and the Primary. Therefore, it is crucial to conduct a thorough analysis of which data is valuable to you. This becomes especially important if such a scenario were to occur in a production environment.
After removing the data directory and starting patroni I get at least a different error, so this seems connected to pgbackrest?
Which version of postgresql_cluster are you using? Could you attach a configuration file /etc/pgbackrest/pgbackrest.conf
?
@Hannsre If you're using pgBackRest to prepare replicas, then you already have a backup stored, correct? Otherwise, you should only specify basebackup
in the patroni_create_replica_methods variable.
Hopefully, we are dealing with a test database cluster.
Yes, this is primarily for Testing, there is no Data involved yet. This Setup is mostly for learning so I'm fine with setting it up and tearing it down again until I got it right and also got it to restore from an actual backup for disaster recovery and so on.
I should be on Release 2.0, as I just cloned the repo a few days ago.
I got the differing IDs figured, because this was my mistake:
If you're using pgBackRest to prepare replicas, then you already have a backup stored, correct?
I did not, misunderstood the config in the playbook and fixed it now so creating the Cluster now seems to be working. At least Patroni logs look fine and Primary and Replica see each other.
Here's the pgbackrest.conf
content:
[global]
log-level-file=detail
log-path=/var/log/pgbackrest
repo1-type=s3
repo1-path=/pgbackrest
repo1-s3-key=redacted
repo1-s3-key-secret=redacted
repo1-s3-bucket=name of our bucket like shown in minio
repo1-s3-endpoint=our minio domain
repo1-s3-region=our minio region
repo1-retention-full=4
repo1-retention-archive=4
repo1-bundle=y
repo1-block=y
start-fast=y
stop-auto=y
link-all=y
resume=n
spool-path=/var/spool/pgbackrest
archive-async=y
archive-get-queue-max=1GiB
[stanza-name]
process-max=4
log-level-console=info
recovery-option=recovery_target_action=promote
pg1-socket-path=/var/run/postgresql
pg1-path=/var/lib/postgresql/16/main
What I'm still struggling with is pgbackrest in general. I've set it up as above, but get an error that the minio Domain can't be found, because the URL pgbackrest creates/uses is bucketname.domain.com instead of domain.com/bucketname which doesn't make much sense to me.
[HostConnectError] on 2 retries from 120100-180201ms: unable to get address for 'bucketname.minio-domain.com': [-2] Name or service not known",
Also there is unable to acquire lock on file '/tmp/pgbackrest/stnaza-name-archive-1.lock': Resource temporarily unavailable"
- I can see the Lockfile being created on the hosts though.
Here's the pgbackrest config part from main.yml
:
# pgBackRest
pgbackrest_install: true # or 'true' to install and configure backups using pgBackRest
pgbackrest_install_from_pgdg_repo: true # or 'false'
pgbackrest_stanza: "{{ patroni_cluster_name }}" # specify your --stanza
pgbackrest_repo_type: "s3" # or "s3", "gcs", "azure"
pgbackrest_repo_host: "" # dedicated repository host (optional)
pgbackrest_repo_user: "postgres" # if "repo_host" is set (optional)
pgbackrest_conf_file: "/etc/pgbackrest/pgbackrest.conf"
# config https://pgbackrest.org/configuration.html
pgbackrest_conf:
global: # [global] section
- { option: "log-level-file", value: "detail" }
- { option: "log-path", value: "/var/log/pgbackrest" }
- { option: "repo1-type", value: "{{ pgbackrest_repo_type | lower }}" }
- { option: "repo1-path", value: "/pgbackrest" } # logical path in bucket
- { option: "repo1-s3-key", value: "{{ s3_access_key }}" } # change this value
- { option: "repo1-s3-key-secret", value: "{{ s3_secret_key }}" } # change this value
- { option: "repo1-s3-bucket", value: "bucketname" } # change this value
- { option: "repo1-s3-endpoint", value: "minio URL" } # change this value
- { option: "repo1-s3-region", value: "Minio Region" } # change this value
- { option: "repo1-retention-full", value: "4" }
- { option: "repo1-retention-archive", value: "4" }
- { option: "repo1-bundle", value: "y" }
- { option: "repo1-block", value: "y" }
- { option: "start-fast", value: "y" }
- { option: "stop-auto", value: "y" }
- { option: "link-all", value: "y" }
- { option: "resume", value: "n" }
- { option: "spool-path", value: "/var/spool/pgbackrest" }
- { option: "archive-async", value: "y" } # Enables asynchronous WAL archiving (details: https://pgbackrest.org/user-guide.html#async-archiving)
- { option: "archive-get-queue-max", value: "1GiB" }
# - { option: "archive-push-queue-max", value: "100GiB" }
# - { option: "backup-standby", value: "y" } # When set to 'y', standby servers will be automatically added to the stanza section.
# - { option: "", value: "" }
stanza: # [stanza_name] section
- { option: "process-max", value: "4" }
- { option: "log-level-console", value: "info" }
- { option: "recovery-option", value: "recovery_target_action=promote" }
- { option: "pg1-socket-path", value: "{{ postgresql_unix_socket_dir }}" }
- { option: "pg1-path", value: "{{ postgresql_data_dir }}" }
# - { option: "", value: "" }
s3_access_key
and s3_scret_key
are set in an ansible vault and passed from there.
My goal in general is to be able to deploy and recreate the cluster from Backups using ansible, so in case anything goes wrong it can be restored to a working state quickly. But I'm not sure what's still not right in the playbook to get pgbackrest to actually deploy to a working state.
Quick Update:
I had to add - { option: "repo1-s3-uri-style", value: "path" }
to the pgbackrest_conf
which apparently I forgot.
Also had to add - { option: "pg1-user", value: "postgres" }
to the stanza to get a connection.
So the repository now got created in our minio bucket.
Stuck now at [073]: unable to list file info for missing path '/var/lib/postgresql/16/main/archive_status'
while trying to create the first backup.
So it's mostly me fighting pgbackrest config now I guess.
Yes, according to the configuration example for Minio in the documentation - https://postgresql-cluster.org/management/backup#command-line the "repo1-s3-uri-style" option is required.
Name or service not known
Add the minio domain in the etc_hosts variable.
Add the minio domain in the etc_hosts variable.
It's a FQDN with proper DNS set up, the error came only due to me not adding the repo1-s3-uri-style
-Option.
Think I got it working now, at least the bucket is now getting populated with archives on an archive run as well as no more errors in the logs.
The error was sitting in front of the terminal after all.
Thanks for your time and patience!
Hey,
sorry to reopen this, but after some more testing (and a vacation) some more questions came up specifically for backups and restore. Didn't want to open a new issue as this is still connected to my former issues. Deploying a new cluster works fine now, as well as creating the backups and backup tasks. The Backups do work in theory, but are a bit flaky.
The restore command:
Sometimes the restore process simply fails at step Waiting for PostgreSQL Recovery to complete (WAL apply);
with a pretty generic error:
atal: [192.168.88.103]: FAILED! => {"changed": true, "cmd": ["/usr/lib/postgresql/16/bin/pg_ctl", "start", "-D", "/var/lib/postgresql/16/main", "-o", "-c hot_standby=off", "-w", "-t", "1800"], "delta": "0:00:00.205953", "end": "2024-10-10 08:52:21.402731", "msg": "non-zero return code", "rc": 1, "start": "2024-10-10 08:52:21.196778", "stderr": "pg_ctl: could not start server\nExamine the log output.",
I know this can happen if there's already a postgres process running, but in those cases it usually said exactly that. Here I get no more details than this.
There are no more errors in any of the logs, all of the restore processes show success. Last messages before the failure above are:
"2024-10-10 08:52:21.395 P00 INFO: found 000000020000000000000021 in the archive asynchronously", "2024-10-10 08:52:21.395 P00 INFO: archive-get command end: completed successfully (1ms)", "2024-10-10 08:52:21.396 UTC [449893] LOG: restored log file \"000000020000000000000021\" from archive", "2024-10-10 08:52:21.399 UTC [449893] LOG: completed backup recovery with redo LSN 0/1E000028 and end LSN 0/1E000138", "2024-10-10 08:52:21.399 UTC
[449893] LOG: consistent recovery state reached at 0/1E000138", "2024-10-10 08:52:21.399 UTC [449893] LOG: recovery stopping after reaching consistency", "2024-10-10 08:52:21.400 UTC [449890] LOG: shutdown at recovery target"
"2024-10-10 08:52:21.400 UTC [449891] LOG: shutting down", "2024-10-10 08:52:21.401 UTC [449890] LOG: database system is shut down", " stopped waiting"]}
If I run the restore again it mostly works afterwards for that node, but then hangs for a very long time after the replicas have been restored while waiting for the master. There is just no output at all by ansible, even at highest verbosity level. I can see the command in htop on the master, but can't see anything happening. Last successful recovery took about 1,5 hours of waiting for the master node.
Settings in main.yml
:
# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#bootstrap
patroni_cluster_bootstrap_method: "pgbackrest" # or "wal-g", "pgbackrest", "pg_probackup"
# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#building-replicas
patroni_create_replica_methods:
- pgbackrest
# - wal_g
# - pg_probackup
- basebackup
pgbackrest:
- { option: "command", value: "{{ pgbackrest_patroni_cluster_restore_command }}" }
- { option: "keep_data", value: "True" }
- { option: "no_params", value: "True" }
basebackup:
- { option: "max-rate", value: "1000M" }
- { option: "checkpoint", value: "fast" }
# - { option: "waldir", value: "{{ postgresql_wal_dir }}" }
and
postgresql_restore_command: "/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} archive-get %f %p" # restore WAL-s using pgbackrest
I've changed the cluster restore command as per the docs to
pgbackrest_patroni_cluster_restore_command:
'/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} --type=immediate --delta restore'
I can also post the full main.yml
if you prefer.
Then ran the playbook with -t point_in_time_recovery -e "disable_archive_command=false"
.
Since it generelly works, but just takes very long, I wonder if it could be any setting on our systems that blocks the process? All nodes are created using ansible and terraform, so they should be absolutely identical.
And a general question because after reading the docs multiple times and trying different approaches I can't seem to find the correct way to restore a Cluster from backup after a complete loss of the cluster. Like when you have to redeploy the cluster to a whole new set of hosts because of a catastrophic failure. PITR seems the wrong approach as it expects an existing cluster. Deploying a new cluster and restoring like above always led to a system ID mismatch while/after restore or it fails due to mismatching stanzas.
Cloning didn't work as well for some reason (I forgot to write down the error here, sorry, but I'll try this way again).
So what would be the best/correct approach here?
I'm sure I'm missing something to properly restore it from a complete loss. This setup is a lot to take in, so every input is much apprechiated.
Hi @Hannsre
We are currently working on PITR-related improvements (https://github.com/vitabaks/postgresql_cluster/pull/765). Could you test the restore-timeout
branch?
@SDV109 I also ask you to take a look at this problem.
Hey,
sure, giving it a go right now.
Just noticed that there is pg_ctl_timeout
set in patroni/tasks/main.yml
with a default of 3600, but it's not defined in/vars/main.yml
like the new cluster_restore_timeout
is.
I've set both timeouts rather short to test, our test-data isn't huge so it shouldn't matter. The restore process itself is usually done within a minute.
Will report back once done.
I'm not sure why, as my ansible knowledge isn't that deep yet, but it seems the pg_ctl_timeout
is ignored because ansible adds a newline at the end of the command:
changed: [192.168.88.102] => {
"changed": true,
"cmd": "/usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off' -w -t 180\n",
"delta": "0:00:00.240277",
"end": "2024-10-11 07:42:01.847932",
"invocation": {
"module_args": {
"_raw_params": "/usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off' -w -t 180\n",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": "/bin/bash",
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true
}
},
So it's like before, it finished restoring on the replicas, but get's stuck at the master node. No ansible output whatsoever and even with timeout set to 180 Seconds it waits much longer than that.
Hey guys,
sorry, to bug you again, but any new Ideas on this? I'm really stuck here because while it does work to deploy the cluster, running it without working backup & restore sounds like a bad time. I know this is free support in your spare time, so don't take this as critique, I'm simply all out of ideas...
this week I plan to try to reproduce the problem and suggest an improvement as part of the PR.
Thanks! I also just noticed I forgot to add another error message that might or might not be helpful. Wenn running the restore, it will succeed on the replicas, but on the master node it fails with:
"2024-10-28 12:07:46.168 P00 ERROR: [038]: unable to restore while PostgreSQL is running", " HINT: presence of 'postmaster.pid' in '/var/lib/postgresql/16/main' indicates PostgreSQL is running.", " HINT: remove 'postmaster.pid' only if PostgreSQL is not running.", "2024-10-28 12:07:46.168 P00 INFO: restore command end: aborted with exception [038]"]}
I then have to stop postgres & Remove the postmaster.pid
manually before I can run the playbook again to attempt a restore. Then it'll run untill it times out (or sometimes not) on the master Node.
Not sure what to make of that, but it seems connected since it should stop postgres like it does on the replicas to then restore.
So I did some more testing and the whole issue seems to be revolving around the Leader Node not being able to properly restart postgres after the restore. I stopped/removed postgres like in my comment before by running
sudo systemctl stop postgresql
sudo rm /var/lib/postgresql/16/main/postmaster.pid
run the playbook
ansible-playbook deploy_pgcluster.yml -t point_in_time_recovery -e "disable_archive_command=false" -e '@pgcn.yml' --ask-vault-pass -v
and wait. When Ansible ist stuck waiting for Postgres to finish and restart on the leader I again run
sudo systemctl stop postgresql
sudo rm /var/lib/postgresql/16/main/postmaster.pid
and afterwards
sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off' -w -t 180
So basically I do manually what ansible is supposed to do from what I gathered. The playbook will then continue and eventually finish successfully
TASK [deploy-finish : Postgres Cluster info] *************************************************************************************************************************************************************************************************************************************************
ok: [192.168.88.101] => {
"msg": [
"+ Cluster: psql-dev (7423735485093921234) +----+-----------+",
"| Member | Host | Role | State | TL | Lag in MB |",
"+--------+----------------+---------+---------+----+-----------+",
"| pgcn01 | 192.168.88.101 | Leader | running | 4 | |",
"| pgcn02 | 192.168.88.102 | Replica | running | 2 | 0 |",
"| pgcn03 | 192.168.88.103 | Replica | running | 2 | 0 |",
"+--------+----------------+---------+---------+----+-----------+"
]
}
PLAY RECAP ***********************************************************************************************************************************************************************************************************************************************************************************
192.168.88.101 : ok=59 changed=6 unreachable=0 failed=0 skipped=68 rescued=0 ignored=0
192.168.88.102 : ok=38 changed=4 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0
192.168.88.103 : ok=38 changed=4 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=222 rescued=0 ignored=0
I'm unsure what to make of this, but maybe it helps you find any potential Issues.
It also does look like the restore is successful, at least I get no error by the application when I point it to use the cluster instead of it's local DB and the data seems to be all there.
Small Addition/Edit:
While running the restore for the first time I don't have to stop/remove postgres and the pid, but simply kill the running process postgres /usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off
and run it again to get the Playbook to finish.
I also deleted some entries from the DB, then ran the restore like above and it all came back as supposed.
Thanks for the details provided, it's really helpful.
@Hannsre I have made some important improvements here https://github.com/vitabaks/postgresql_cluster/pull/765, please check.
Thanks, this seems to work fine now! Updated the patroni main.yml
and ran the restore like before and it went through just fine and pretty quick as well.
For my understanding: You changed the handling of the leader node so it now runs a check to see if postgres is (still) up, then stop it if true on Lines 704-727? Then added further checks for starting it back up.
One last question, now that restore is working fine: What is the best way to recover in case of a complete loss of the cluster? I'll backup the nodes in general as well, so I can restore those, but I wanna cover all my bases before going into production. You never know what kind of catastrophic failure may happen ;)
I've had issues recreating the cluster nodes after deleting them on purpose, because it complained either about postgres not existing (when running with PITR command) or that the stanza already exist if creating a new cluster. Setting the bootstrap method to pgbackrest
also did not work (I forgot the error tbh). But this might've been connected to this former issue, because this seems to be the correct way from my understanding. Or did I get this wrong and there's another, correct way?
Thanks, this seems to work fine now! Updated the patroni main.yml and ran the restore like before and it went through just fine and pretty quick as well.
Great, I'm glad it worked out.
For my understanding: You changed the handling of the leader node so it now runs a check to see if postgres is (still) up, then stop it if true on Lines 704-727? Then added further checks for starting it back up.
Yes, we have added additional checks to ensure the reliability of the recovery procedure.
What is the best way to recover in case of a complete loss of the cluster?
This is described in the documentation https://postgresql-cluster.org/docs/management/restore
I've had issues recreating the cluster nodes after deleting them on purpose, because it complained either about postgres not existing (when running with PITR command) or that the stanza already exist if creating a new cluster. Setting the bootstrap method to pgbackrest also did not work (I forgot the error tbh). But this might've been connected to this former issue, because this seems to be the correct way from my understanding. Or did I get this wrong and there's another, correct way?
I recommend performing more recoveries to practice and build confidence in the restoration process. I personally manage hundreds of clusters and have been using this method for over 5 years for in-place recovery or cloning to a new cluster.
Additionally, to ensure you always have reliable and timely assistance, you may consider purchasing an individual support package.
I am closing this issue, if you have any problems or questions, feel free to open a new one.
Hey,
first of all thanks for this project! Really appreciate the time and effort.
I'm kinda stuck though, not really sure how to proceed. I've had a cluster running already, but since we're still testing and I discovered some issues, I tore it down to redeploy it. The first deployment was on an older version of the ansible playbook though, so I'm not sure what changed. I compared my old version with the new and couldn't really spot much of a difference.
Basis are Debian 12 VMs on Proxmox 8 deployed via Terraform, so they are absolutely identical.
The patroni master Node works apparently, but the Replicas won't connect. According to the Logs they are in a different Cluster ID.
I've checked and tried the solution in #747 , but unfortunately it didn't do the trick for me. After removing the configs and cluster, it couldn't connect to postgresql anymore at all.
This is the output from the playbook:
And the logs on both Replica Hosts:
This is the relvant part in my inventory (I think):
I've replaced the cluster completely twice now, changing the cluster name in between installs, but to no avail. Not sure what else you need to troubleshoot, just let me know which logs or config you need.
Thanks in advance!