vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on Patroni). Automating with Ansible.
https://postgresql-cluster.org
MIT License
1.48k stars 394 forks source link

How to configure WAL-G? #692

Closed algoritmsystems closed 1 month ago

algoritmsystems commented 1 month ago

How to configure WAL-G? for restore and backup? I've a minio server on 172.20.20.22.

cat /etc/default/minio MINIO_ROOT_USER=myminioadmin MINIO_ROOT_PASSWORD=myminioadmin MINIO_VOLUMES="/home/ubuntu/data" MINIO_OPTS="--console-address :9001"

vars/main.yml

patroni_cluster_bootstrap_method: "wal-g" 
#...
patroni_create_replica_methods:
- wal_g
- basebackup
#...
postgresql_restore_command: "{{ wal_g_path }} wal-fetch %f %p"
#...
wal_g_install: true 
wal_g_version: "3.0.0"
wal_g_path: "/usr/local/bin/wal-g"
wal_g_json: 
  - { option: "PGDATA", value: "{{ postgresql_data_dir }}" }
  - { option: "PGHOST", value: "{{ postgresql_unix_socket_dir }}" }
  - { option: "PGPORT", value: "{{ postgresql_port }}" 
  - { option: "PGUSER", value: "{{ patroni_superuser_username }}" }
  - { option: "AWS_S3_FORCE_PATH_STYLE", value: "true" }  
  - { option: "AWS_ENDPOINT", value: "http://172.20.20.22:9000" }
wal_g_archive_command: "{{ wal_g_path }} wal-push %p"
wal_g_patroni_cluster_bootstrap_command: "{{ wal_g_path }} backup-fetch {{ postgresql_data_dir }} LATEST"

ERROR:

FAILED! => {"changed": false, "cmd": "/usr/bin/gmake deps USE_BROTLI=1 USE_LIBSODIUM=1", "msg": "CMake Deprecation Warning at CMakeLists.txt:5 (cmake_minimum_required):\n  Compatibility with CMake < 2.8.12 will be removed from a future version of\n  CMake.\n\n  Update the VERSION argument <min> value or use a ...<max> suffix to tell\n  CMake that the project does not need compatibility with older versions.\n\n\n-- Compiler is not EMSCRIPTEN\n\ngzip: stdin: not in gzip format\ntar: Child returned status 1\ntar: Error is not recoverable: exiting now\ngmake: *** [Makefile:296: link_libsodium]
vitabaks commented 1 month ago

Hi @algoritmsystems

Try version 3.0.2 with fix link_libsodium.sh

wal_g_version: "3.0.2"

Related issue: https://github.com/vitabaks/postgresql_cluster/issues/678

algoritmsystems commented 1 month ago

Hi @vitabaks ! I installed an object storage minio on 172.20.20.22 server, is this configuration is correct:

wal_g_install: true  
wal_g_version: "3.0.2"
wal_g_path: "/usr/local/bin/wal-g"
wal_g_json:  # config https://github.com/wal-g/wal-g#configuration
  - { option: "PGDATA", value: "{{ postgresql_data_dir }}" }
  - { option: "PGHOST", value: "{{ postgresql_unix_socket_dir }}" }
  - { option: "PGPORT", value: "{{ postgresql_port }}" }
  - { option: "PGUSER", value: "{{ patroni_superuser_username }}" }
  - { option: "AWS_S3_FORCE_PATH_STYLE", value: "true" }  # to use Minio.io S3-compatible storage
  - { option: "AWS_ENDPOINT", value: "http://172.20.20.22:9000" }

I updated and there is a new error:

ERROR:
fatal: [172.20.20.25]: FAILED! => {"changed": false, "cmd": "/usr/bin/gmake deps USE_BROTLI=1 USE_LIBSODIUM=1", "msg": "go: downloading .....
....."go: downloading golang.org/x/term v0.18.0", "go: updates to go.mod needed; to update it:", "\tgo mod tidy", "gmake: *** [Makefile:288: go_deps] Error 1"], "stdout": "git submodule update --init\ngo mod vendor\n", "stdout_lines": ["git submodule update --init", "go mod vendor"]} 
vitabaks commented 1 month ago

Fixed here https://github.com/vitabaks/postgresql_cluster/pull/693

Please check.

algoritmsystems commented 1 month ago

Thanks @vitabaks ! Now it is working but there is a new issue:

TASK [patroni : Wait for port 8008 to become open on the host] ******************************************************************************************************************************************************************************
ok: [172.20.20.24]
FAILED - RETRYING: [172.20.20.24]: Wait for PostgreSQL Recovery to complete (WAL apply) (1200 retries left).
FAILED - RETRYING: [172.20.20.24]: Wait for PostgreSQL Recovery to complete (WAL apply) (1199 retries left).
FAILED - RETRYING: [172.20.20.24]: Wait for PostgreSQL Recovery to complete (WAL apply) (1198 retries left).
FAILED - RETRYING: [172.20.20.24]: Wait for PostgreSQL Recovery to complete (WAL apply) (1197 retries left).
FAILED - RETRYING: [172.20.20.24]: Wait for PostgreSQL Recovery to complete (WAL apply) (1196 retries left).

Despite patroni is running on leader:

tcp            LISTEN          0               5                              0.0.0.0:8008                         0.0.0.0:*             users:(("patroni",pid=1557454,fd=5))                                                                

systemctl status patroni

● patroni.service - Runners to orchestrate a high-availability PostgreSQL - Patroni
     Loaded: loaded (/etc/systemd/system/patroni.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-07-12 11:45:02 +05; 22s ago
    Process: 78373 ExecStartPre=/usr/bin/sudo /sbin/modprobe softdog (code=exited, status=0/SUCCESS)
    Process: 78378 ExecStartPre=/usr/bin/sudo /bin/chown postgres /dev/watchdog (code=exited, status=0/SUCCESS)
   Main PID: 78380 (patroni)
      Tasks: 33 (limit: 19134)
     Memory: 51.4M
        CPU: 1.060s
     CGroup: /system.slice/patroni.service
             ├─70941 /usr/local/bin/wal-g backup-fetch /var/lib/postgresql/14/main LATEST
             ├─78380 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni/patroni.yml
             └─78392 /usr/local/bin/wal-g backup-fetch /var/lib/postgresql/14/main LATEST
Jul 12 11:45:03 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:03,056 INFO: trying to bootstrap a new cluster
Jul 12 11:45:03 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:03,058 INFO: Running custom bootstrap script: /usr/local/bin/wal-g backup-fetch /var/lib/postgresql/14/main LATEST
Jul 12 11:45:03 pdb20i24c2 patroni[78392]: INFO: 2024/07/12 11:45:03.180539 Selecting the latest backup...
Jul 12 11:45:03 pdb20i24c2 patroni[78392]: INFO: 2024/07/12 11:45:03.181841 Backup to fetch will be searched in storages: [default]
Jul 12 11:45:12 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:12,904 INFO: Lock owner: None; I am pdb20i24c2
Jul 12 11:45:12 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:12,904 INFO: not healthy enough for leader race
Jul 12 11:45:13 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:13,002 INFO: bootstrap in progress
Jul 12 11:45:22 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:22,904 INFO: Lock owner: None; I am pdb20i24c2
Jul 12 11:45:22 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:22,904 INFO: not healthy enough for leader race
Jul 12 11:45:22 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:22,904 INFO: bootstrap in progress
systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; preset: enabled)
     Active: active (exited) since Fri 2024-07-12 11:48:28 +05; 2s ago
    Process: 79808 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 79808 (code=exited, status=0/SUCCESS)
        CPU: 5ms
Jul 12 11:48:28 pdb20i24c2 systemd[1]: Starting postgresql.service - PostgreSQL RDBMS...
Jul 12 11:48:28 pdb20i24c2 systemd[1]: Finished postgresql.service - PostgreSQL RDBMS.

but no 5432 up

vitabaks commented 1 month ago

Judging by the logs, the cluster is being restored from a backup

Jul 12 11:45:03 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:03,056 INFO: trying to bootstrap a new cluster
Jul 12 11:45:03 pdb20i24c2 patroni[78380]: 2024-07-12 11:45:03,058 INFO: Running custom bootstrap script: /usr/local/bin/wal-g backup-fetch /var/lib/postgresql/14/main LATEST
Jul 12 11:45:03 pdb20i24c2 patroni[78392]: INFO: 2024/07/12 11:45:03.180539 Selecting the latest backup...
Jul 12 11:45:03 pdb20i24c2 patroni[78392]: INFO: 2024/07/12 11:45:03.181841 Backup to fetch will be searched in storages: [default]

So you specified the variable patroni_cluster_bootstrap_method: "wal-g", if you don't have a backup yet (wal-g backup-list), then change it to initdb to create a new (empty) cluster.

algoritmsystems commented 1 month ago

So you specified the variable patroni_cluster_bootstrap_method: "wal-g", if you don't have a backup yet (wal-g backup-list), then change it to initdb to create a new (empty) cluster.

  1. patroni_cluster_bootstrap_method: "wal-g" doing backup during the postgresql cluster deployment?
  2. If use patroni_cluster_bootstrap_method: "initdb", can I then use wal-g for backup and restore? Thanks!
vitabaks commented 1 month ago

patroni_cluster_bootstrap_method: "wal-g" doing backup during the postgresql cluster deployment?

Restore. If wal-g is specified for bootstrap, the cluster will be created from a backup.

If use patroni_cluster_bootstrap_method: "initdb", can I then use wal-g for backup and restore?

after you already have a cluster and backup copy, you can change it to "wal-g" to be able to restore the cluster.

See details https://github.com/vitabaks/postgresql_cluster#restore-and-cloning

algoritmsystems commented 1 month ago

after you already have a cluster and backup copy, you can change it to "wal-g" to be able to restore the cluster.

If I first deploy cluster with bootstrap="initdb", to swith from "initdb" to "wal-g", should I remove cluster firts and then run deploy one more times:

ansible-playbook remove_cluster.yml ansible-playbook deploy_pgcluster.yml

Or it's done by installing "wal-g" on each node? How to add a new service in to the existing cluster, is it posible to edit vars/main.yml, change to bootstrap="wal-g", and deploy ? Thanks!

vitabaks commented 1 month ago

If I first deploy cluster with bootstrap="initdb", to swith from "initdb" to "wal-g", should I remove cluster firts and then run deploy one more times:

No. You don't need to delete the cluster.

How to add a new service in to the existing cluster, is it posible to edit vars/main.yml, change to bootstrap="wal-g", and deploy ?

See Cluster Scaling

change to bootstrap="wal-g", and deploy ?

Yes, if you have a backup copy, you can prepare a replica (patroni_create_replica_methods) from it or deploy a new cluster (patroni_cluster_bootstrap_method).

Jamic28 commented 1 month ago

Thanks @vitabaks ! How to configure backup from cluster to minio server every hour asynchroniously:

synchronous_mode: false ... postgresql_parameters:

  • { option: "archive_mode", value: "on" }
  • { option: "archive_command", value: "{{ wal_g_archive_command }}" }
  • { option: "wal_level", value: "replica" } ... wal_g_cron_jobs:
  • name: "WAL-G: Create hourly backup" user: "postgres" file: /etc/cron.d/walg minute: "0" hour: "" day: "" month: "" weekday: "" job: "{{ wal_g_backup_command | join('') }}"

Manually it's working, but not backing up automatically

vitabaks commented 1 month ago

Please attach /etc/cron.d/walg content.

Jamic28 commented 1 month ago

Please attach /etc/cron.d/walg content.

WAL-G

wal_g_install: true wal_g_version: "3.0.2" wal_g_path: "/usr/local/bin/wal-g" wal_g_json:

wal_g_archive_command: "{{ wal_g_path }} wal-push %p" wal_g_patroni_cluster_bootstrap_command: "{{ wal_g_path }} backup-fetch {{ postgresql_data_dir }} LATEST"

wal_g_backup_command:

wal_g_cron_jobs:

vitabaks commented 1 month ago

perhaps the latest version of wal-g requires we to explicitly specify the config path.

Try it this option:

wal_g_path: "/usr/local/bin/wal-g --config /var/lib/postgresql/.walg.json"

https://github.com/vitabaks/postgresql_cluster/issues/696

algoritmsystems commented 1 month ago

Thanks!