vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on Patroni). Automating with Ansible.
https://postgresql-cluster.org
MIT License
1.52k stars 399 forks source link

TASK [patroni : Wait for port 8008 to become open on the host] #391

Closed fatmaAliGamal closed 1 year ago

fatmaAliGamal commented 1 year ago

TASK [patroni : Prepare PostgreSQL | reload for apply the pg_hba.conf] ***** changed: [10.0.4.133] ok: [10.0.4.175] ok: [10.0.5.94]

TASK [patroni : Start patroni service on Replica servers] ** changed: [10.0.4.175] changed: [10.0.5.94]

TASK [patroni : Wait for port 8008 to become open on the host] ***** ok: [10.0.4.175] ok: [10.0.5.94] FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1200 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1200 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1199 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1199 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1198 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1198 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1197 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1197 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1196 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1196 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1195 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1195 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1194 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1194 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1193 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1193 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1192 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1192 retries left). FAILED - RETRYING: [10.0.4.175]: Check that the patroni is healthy on the replica server (1191 retries left). FAILED - RETRYING: [10.0.5.94]: Check that the patroni is healthy on the replica server (1191 retries left).

after trying to use - { option: "archive_command", value: "wal-g wal-push %p" } # archive WAL-s using WAL-G to backup i use wal-g

main.yml

---
# ---------------------------------------------------------------------
# Proxy variables (optional) for download packages using a proxy server
proxy_env: {}  # yamllint disable rule:braces
#  http_proxy: http://10.128.64.9:3128
#  https_proxy: http://10.128.64.9:3128
# ---------------------------------------------------------------------

# Cluster variables
cluster_vip: ""  # ip address for client access to databases in the cluster (optional)
vip_interface: "{{ ansible_default_ipv4.interface }}"  # interface name (ex. "ens32")

patroni_cluster_name: "postgres-cluster"  # the cluster name (must be unique for each cluster)
patroni_install_version: "latest"  # or specific version (example 1.5.6)

patroni_superuser_username: "postgres"
patroni_superuser_password: "postgres-pass"  # please change password
patroni_replication_username: "replicator"
patroni_replication_password: "replicator-pass"  # please change password

synchronous_mode: false  # or 'true' for enable synchronous database replication
synchronous_mode_strict: false  # if 'true' then block all client writes to the master, when a synchronous replica is not available
synchronous_node_count: 1  # number of synchronous standby databases

# Load Balancing
with_haproxy_load_balancing: false  # or 'true' if you want to install and configure the load-balancing
haproxy_listen_port:
  master: 5000
  replicas: 5001
  replicas_sync: 5002
  replicas_async: 5003
  stats: 7000
haproxy_maxconn:
  global: 100000
  master: 10000
  replica: 10000
haproxy_timeout:
  client: "60m"
  server: "60m"

# vip-manager (if cluster_vip is specified and with_haproxy_load_balancing: false)
vip_manager_version: "2.1.0"  # version to install
vip_manager_conf: "/etc/patroni/vip-manager.yml"
vip_manager_interval: "1000"  # time (in milliseconds) after which vip-manager wakes up and checks if it needs to register or release ip addresses.
vip_manager_iface: "{{ vip_interface }}"  # interface to which the virtual ip will be added
vip_manager_ip: "{{ cluster_vip }}"  # the virtual ip address to manage
vip_manager_mask: "24"  # netmask for the virtual ip

# DCS (Distributed Consensus Store)
dcs_exists: false  # or 'true' if you don't want to deploy a new etcd cluster
dcs_type: "etcd"  # or 'consul'

# if dcs_type: "etcd" and dcs_exists: false
etcd_version: "3.5.9"  # version for deploy etcd cluster
etcd_data_dir: "/var/lib/etcd"
etcd_cluster_name: "etcd-{{ patroni_cluster_name }}"  # ETCD_INITIAL_CLUSTER_TOKEN

# if dcs_type: "etcd" and dcs_exists: true
patroni_etcd_hosts: []  # list of servers of an existing etcd cluster
#  - { host: "10.128.64.140", port: "2379" }
#  - { host: "10.128.64.142", port: "2379" }
#  - { host: "10.128.64.143", port: "2379" }

# more options you can specify in the roles/patroni/templates/patroni.yml.j2
# https://patroni.readthedocs.io/en/latest/SETTINGS.html#etcd
# https://patroni.readthedocs.io/en/latest/SETTINGS.html#consul

# if dcs_type: "consul"
consul_version: "1.14.7"
consul_config_path: "/etc/consul"
consul_configd_path: "{{ consul_config_path }}/conf.d"
consul_data_path: "/var/lib/consul"
consul_domain: "consul"  # Consul domain name
consul_datacenter: "dc1"  # Datacenter label (can be specified for each host in the inventory)
consul_disable_update_check: true  # Disables automatic checking for security bulletins and new version releases
consul_enable_script_checks: true  # This controls whether health checks that execute scripts are enabled on this agent
consul_enable_local_script_checks: true  # Enable them when they are defined in the local configuration files
consul_ui: false  # Enable the consul UI?
consul_syslog_enable: true  # Enable logging to syslog
consul_iface: "{{ ansible_default_ipv4.interface }}"  # specify the interface name with a Private IP (ex. "enp7s0")
# TLS
# You can enable TLS encryption by dropping a CA certificate, server certificate, and server key in roles/consul/files/
consul_tls_enable: false
consul_tls_ca_crt: "ca.crt"
consul_tls_server_crt: "server.crt"
consul_tls_server_key: "server.key"
# DNS
consul_recursors: []  # List of upstream DNS servers
consul_dnsmasq_enable: true  # Enable DNS forwarding with Dnsmasq
consul_dnsmasq_cache: 0  # dnsmasq cache-size (0 - disable caching)
consul_dnsmasq_servers:  # Upstream DNS servers used by dnsmasq
  - "8.8.8.8"
  - "9.9.9.9"
consul_join: []  # List of LAN servers of an existing consul cluster, to join.
# - "10.128.64.140"
# - "10.128.64.142"
# - "10.128.64.143"

# https://developer.hashicorp.com/consul/docs/discovery/services
consul_services:
  - name: "{{ patroni_cluster_name }}"
    id: "{{ patroni_cluster_name }}-master"
    tags: ['master']
    port: "{{ pgbouncer_listen_port }}"  # or "{{ postgresql_port }}" if pgbouncer_install: false
    checks:
      - { http: "http://{{ inventory_hostname }}:{{ patroni_restapi_port }}/master", interval: "2s" }
      - { args: ["systemctl", "status", "pgbouncer"], interval: "5s" }  # comment out this check if pgbouncer_install: false
  - name: "{{ patroni_cluster_name }}"
    id: "{{ patroni_cluster_name }}-replica"
    tags: ['replica']
    port: "{{ pgbouncer_listen_port }}"
    checks:
      - { http: "http://{{ inventory_hostname }}:{{ patroni_restapi_port }}/replica", interval: "2s" }
      - { args: ["systemctl", "status", "pgbouncer"], interval: "5s" }
#  - name: "{{ patroni_cluster_name }}"
#    id: "{{ patroni_cluster_name }}-sync-replica"
#    tags: ['sync-replica']
#    port: "{{ pgbouncer_listen_port }}"
#    checks:
#      - { http: "http://{{ inventory_hostname }}:{{ patroni_restapi_port }}/sync", interval: "2s" }
#      - { args: ["systemctl", "status", "pgbouncer"], interval: "5s" }
#  - name: "{{ patroni_cluster_name }}"
#    id: "{{ patroni_cluster_name }}-async-replica"
#    tags: ['async-replica']
#    port: "{{ pgbouncer_listen_port }}"
#    checks:
#      - { http: "http://{{ inventory_hostname }}:{{ patroni_restapi_port }}/async", interval: "2s" }
#      - { args: ["systemctl", "status", "pgbouncer"], interval: "5s" }

# PostgreSQL variables
postgresql_version: "15"
# postgresql_data_dir: see vars/Debian.yml or vars/RedHat.yml
postgresql_port: "5432"
postgresql_encoding: "UTF8"  # for bootstrap only (initdb)
postgresql_locale: "en_US.UTF-8"  # for bootstrap only (initdb)
postgresql_data_checksums: true  # for bootstrap only (initdb)
postgresql_password_encryption_algorithm: "scram-sha-256"  # or "md5" if your clients do not work with passwords encrypted with SCRAM-SHA-256

# (optional) list of users to be created (if not already exists)
postgresql_users: []
#  - { name: "mydb-user", password: "mydb-user-pass", flags: "SUPERUSER" }
#  - { name: "", password: "", flags: "NOSUPERUSER" }
#  - { name: "", password: "", flags: "NOSUPERUSER" }
#  - { name: "", password: "", flags: "NOLOGIN" }

# (optional) list of databases to be created (if not already exists)
postgresql_databases: []
#  - { db: "mydatabase", encoding: "UTF8", lc_collate: "ru_RU.UTF-8", lc_ctype: "ru_RU.UTF-8", owner: "mydb-user" }
#  - { db: "mydatabase2", encoding: "UTF8", lc_collate: "ru_RU.UTF-8", lc_ctype: "ru_RU.UTF-8", owner: "mydb-user", conn_limit: "50" }
#  - { db: "", encoding: "UTF8", lc_collate: "en_US.UTF-8", lc_ctype: "en_US.UTF-8", owner: "" }
#  - { db: "", encoding: "UTF8", lc_collate: "en_US.UTF-8", lc_ctype: "en_US.UTF-8", owner: "" }

# (optional) list of schemas to be created (if not already exists)
postgresql_schemas: []
#  - { schema: "myschema", db: "mydatabase", owner: "mydb-user" }

# (optional) list of database extensions to be created (if not already exists)
postgresql_extensions: []
#  - { ext: "pg_stat_statements", db: "postgres" }
#  - { ext: "pg_stat_statements", db: "mydatabase" }
#  - { ext: "pg_stat_statements", db: "mydatabase", schema: "myschema" }
#  - { ext: "pg_stat_statements", db: "" }
#  - { ext: "pg_stat_statements", db: "" }
#  - { ext: "pg_repack", db: "" }  # postgresql-<version>-repack package is required
#  - { ext: "pg_stat_kcache", db: "" }  # postgresql-<version>-pg-stat-kcache package is required
#  - { ext: "", db: "" }
#  - { ext: "", db: "" }

# postgresql parameters to bootstrap dcs (are parameters for example)
postgresql_parameters:
  - { option: "max_connections", value: "500" }
  - { option: "superuser_reserved_connections", value: "5" }
  - { option: "password_encryption", value: "{{ postgresql_password_encryption_algorithm }}" }
  - { option: "max_locks_per_transaction", value: "512" }
  - { option: "max_prepared_transactions", value: "0" }
  - { option: "huge_pages", value: "try" }  # or "on" if you set "vm_nr_hugepages" in kernel parameters
  - { option: "shared_buffers", value: "{{ (ansible_memtotal_mb * 0.25)|int }}MB" }  # by default, 25% of RAM
  - { option: "effective_cache_size", value: "{{ (ansible_memtotal_mb * 0.75)|int }}MB" }  # by default, 75% of RAM
  - { option: "work_mem", value: "128MB" }  # please change this value
  - { option: "maintenance_work_mem", value: "256MB" }  # please change this value
  - { option: "checkpoint_timeout", value: "15min" }
  - { option: "checkpoint_completion_target", value: "0.9" }
  - { option: "min_wal_size", value: "2GB" }
  - { option: "max_wal_size", value: "8GB" }  # or 16GB/32GB
  - { option: "wal_buffers", value: "32MB" }
  - { option: "default_statistics_target", value: "1000" }
  - { option: "seq_page_cost", value: "1" }
  - { option: "random_page_cost", value: "4" }  # "1.1" for SSD storage. Also, if your databases fits in shared_buffers
  - { option: "effective_io_concurrency", value: "2" }  # "200" for SSD storage
  - { option: "synchronous_commit", value: "on" }  # or 'off' if you can you lose single transactions in case of a crash
  - { option: "autovacuum", value: "on" }  # never turn off the autovacuum!
  - { option: "autovacuum_max_workers", value: "5" }
  - { option: "autovacuum_vacuum_scale_factor", value: "0.01" }  # or 0.005/0.001
  - { option: "autovacuum_analyze_scale_factor", value: "0.01" }
  - { option: "autovacuum_vacuum_cost_limit", value: "500" }  # or 1000/5000
  - { option: "autovacuum_vacuum_cost_delay", value: "2" }
  - { option: "autovacuum_naptime", value: "1s" }
  - { option: "max_files_per_process", value: "4096" }
  - { option: "archive_mode", value: "on" }
  - { option: "archive_timeout", value: "1800s" }
  # - { option: "archive_command", value: "cd ." }  # not doing anything yet with WAL-s
  - { option: "archive_command", value: "wal-g wal-push %p" }  # archive WAL-s using WAL-G
#  - { option: "archive_command", value: "pgbackrest --stanza={{ pgbackrest_stanza }} archive-push %p" }  # archive WAL-s using pgbackrest
  - { option: "wal_level", value: "replica" }
  - { option: "wal_keep_size", value: "2GB" }
  - { option: "max_wal_senders", value: "10" }
  - { option: "max_replication_slots", value: "10" }
  - { option: "hot_standby", value: "on" }
  - { option: "wal_log_hints", value: "on" }
  - { option: "wal_compression", value: "on" }
  - { option: "shared_preload_libraries", value: "pg_stat_statements,auto_explain" }
  - { option: "pg_stat_statements.max", value: "10000" }
  - { option: "pg_stat_statements.track", value: "all" }
  - { option: "pg_stat_statements.track_utility", value: "false" }
  - { option: "pg_stat_statements.save", value: "true" }
  - { option: "auto_explain.log_min_duration", value: "10s" }  # enable auto_explain for 10-second logging threshold. Decrease this value if necessary
  - { option: "auto_explain.log_analyze", value: "true" }
  - { option: "auto_explain.log_buffers", value: "true" }
  - { option: "auto_explain.log_timing", value: "false" }
  - { option: "auto_explain.log_triggers", value: "true" }
  - { option: "auto_explain.log_verbose", value: "true" }
  - { option: "auto_explain.log_nested_statements", value: "true" }
  - { option: "auto_explain.sample_rate", value: "0.01" }  # enable auto_explain for 1% of queries logging threshold
  - { option: "track_io_timing", value: "on" }
  - { option: "log_lock_waits", value: "on" }
  - { option: "log_temp_files", value: "0" }
  - { option: "track_activities", value: "on" }
  - { option: "track_activity_query_size", value: "4096" }
  - { option: "track_counts", value: "on" }
  - { option: "track_functions", value: "all" }
  - { option: "log_checkpoints", value: "on" }
  - { option: "logging_collector", value: "on" }
  - { option: "log_truncate_on_rotation", value: "on" }
  - { option: "log_rotation_age", value: "1d" }
  - { option: "log_rotation_size", value: "0" }
  - { option: "log_line_prefix", value: "'%t [%p-%l] %r %q%u@%d '" }
  - { option: "log_filename", value: "postgresql-%a.log" }
  - { option: "log_directory", value: "{{ postgresql_log_dir }}" }
  - { option: "hot_standby_feedback", value: "on" }  # allows feedback from a hot standby to the primary that will avoid query conflicts
  - { option: "max_standby_streaming_delay", value: "30s" }
  - { option: "wal_receiver_status_interval", value: "10s" }
  - { option: "idle_in_transaction_session_timeout", value: "10min" }  # reduce this timeout if possible
  - { option: "jit", value: "off" }
  - { option: "max_worker_processes", value: "24" }
  - { option: "max_parallel_workers", value: "8" }
  - { option: "max_parallel_workers_per_gather", value: "2" }
  - { option: "max_parallel_maintenance_workers", value: "2" }
  - { option: "tcp_keepalives_count", value: "10" }
  - { option: "tcp_keepalives_idle", value: "300" }
  - { option: "tcp_keepalives_interval", value: "30" }
#  - { option: "old_snapshot_threshold", value: "60min" }
#  - { option: "", value: "" }
#  - { option: "", value: "" }

# Set this variable to 'true' if you want the cluster to be automatically restarted
# after changing the 'postgresql_parameters' variable that requires a restart in the 'config_pgcluster.yml' playbook.
# By default, the cluster will not be automatically restarted.
pending_restart: false

# specify additional hosts that will be added to the pg_hba.conf
postgresql_pg_hba:
  - { type: "local", database: "all", user: "{{ patroni_superuser_username }}", address: "", method: "trust" }
  - { type: "local", database: "all", user: "all", address: "", method: "peer" }
  - { type: "host", database: "all", user: "all", address: "127.0.0.1/32", method: "{{ postgresql_password_encryption_algorithm }}" }
  - { type: "host", database: "all", user: "all", address: "::1/128", method: "{{ postgresql_password_encryption_algorithm }}" }
#  - { type: "host", database: "mydatabase", user: "mydb-user", address: "192.168.0.0/24", method: "{{ postgresql_password_encryption_algorithm }}" }
#  - { type: "host", database: "all", user: "all", address: "192.168.0.0/24", method: "ident", options: "map=main" }  # use pg_ident

# list of lines that Patroni will use to generate pg_ident.conf
postgresql_pg_ident: []
#  - { mapname: "main", system_username: "postgres", pg_username: "backup" }
#  - { mapname: "", system_username: "", pg_username: "" }

# the password file (~/.pgpass)
postgresql_pgpass:
  - "localhost:{{ postgresql_port }}:*:{{ patroni_superuser_username }}:{{ patroni_superuser_password }}"
  - "{{ inventory_hostname }}:{{ postgresql_port }}:*:{{ patroni_superuser_username }}:{{ patroni_superuser_password }}"
#  - hostname:port:database:username:password

# PgBouncer parameters
pgbouncer_install: true  # or 'false' if you do not want to install and configure the pgbouncer service
pgbouncer_conf_dir: "/etc/pgbouncer"
pgbouncer_log_dir: "/var/log/pgbouncer"
pgbouncer_listen_addr: "0.0.0.0"
pgbouncer_listen_port: 6432
pgbouncer_max_client_conn: 10000
pgbouncer_max_db_connections: 1000
pgbouncer_default_pool_size: 20
pgbouncer_query_wait_timeout: 120
pgbouncer_default_pool_mode: "session"
pgbouncer_admin_users: "postgres"  # comma-separated list of users, who are allowed to change settings
pgbouncer_stats_users: "postgres"  # comma-separated list of users who are just allowed to use SHOW command
pgbouncer_generate_userlist: true  # generate the authentication file (userlist.txt) from the pg_shadow system table
pgbouncer_auth_type: "{{ postgresql_password_encryption_algorithm }}"
pgbouncer_ignore_startup_parameters: "extra_float_digits,geqo,search_path"

pgbouncer_pools:
  - { name: "postgres", dbname: "postgres", pool_parameters: "" }
#  - { name: "mydatabase", dbname: "mydatabase", pool_parameters: "pool_size=20 pool_mode=transaction" }
#  - { name: "", dbname: "", pool_parameters: "" }
#  - { name: "", dbname: "", pool_parameters: "" }

# Extended variables (optional)
patroni_restapi_port: 8008
patroni_ttl: 30
patroni_loop_wait: 10
patroni_retry_timeout: 10
patroni_maximum_lag_on_failover: 1048576
patroni_master_start_timeout: 300

# https://patroni.readthedocs.io/en/latest/SETTINGS.html?highlight=callbacks#dynamic-configuration-settings
patroni_callbacks: []
#  - {action: "on_role_change", script: ""}
#  - {action: "on_stop", script: ""}
#  - {action: "on_restart", script: ""}
#  - {action: "on_reload", script: ""}
#  - {action: "on_role_change", script: ""}

# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#standby-cluster
# Requirements:
# 1. the cluster name for Standby Cluster must be unique ('patroni_cluster_name' variable)
# 2. the IP addresses (or network) of the Standby Cluster servers must be added to the pg_hba.conf of the Main Cluster ('postgresql_pg_hba' variable).
patroni_standby_cluster:
  host: ""  # an address of remote master
  port: "5432"  # a port of remote master
#  primary_slot_name: ""  # which slot on the remote master to use for replication (optional)
#  restore_command: ""  # command to restore WAL records from the remote master to standby leader (optional)
#  recovery_min_apply_delay: ""  # how long to wait before actually apply WAL records on a standby leader (optional)

patroni_log_destination: stderr  # or 'logfile'
# if patroni_log_destination: logfile
patroni_log_dir: /var/log/patroni
patroni_log_level: info
patroni_log_traceback_level: error
patroni_log_format: "%(asctime)s %(levelname)s: %(message)s"
patroni_log_dateformat: ""
patroni_log_max_queue_size: 1000
patroni_log_file_num: 4
patroni_log_file_size: 25000000  # bytes
patroni_log_loggers_patroni_postmaster: warning
patroni_log_loggers_urllib3: warning  # or 'debug'

patroni_watchdog_mode: automatic  # or 'off', 'required'
patroni_watchdog_device: /dev/watchdog

patroni_postgresql_use_pg_rewind: true  # or 'false'
# try to use pg_rewind on the former leader when it joins cluster as a replica.

patroni_remove_data_directory_on_rewind_failure: false  # or 'true' (if use_pg_rewind: 'true')
# avoid removing the data directory on an unsuccessful rewind
# if 'true', Patroni will remove the PostgreSQL data directory and recreate the replica.

patroni_remove_data_directory_on_diverged_timelines: false  # or 'true'
# if 'true', Patroni will remove the PostgreSQL data directory and recreate the replica
# if it notices that timelines are diverging and the former master can not start streaming from the new master.

# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#bootstrap
patroni_cluster_bootstrap_method: "initdb"  # or "wal-g", "pgbackrest", "pg_probackup"

# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#building-replicas
patroni_create_replica_methods:
#  - pgbackrest
 - wal_g
#  - pg_probackup
  # - basebackup

pgbackrest:
  - { option: "command", value: "/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} --delta restore" }
  - { option: "keep_data", value: "True" }
  - { option: "no_params", value: "True" }
wal_g:
  - { option: "command", value: "wal-g backup-fetch {{ postgresql_data_dir }} LATEST" }
  - { option: "no_params", value: "True" }
basebackup:
  - { option: "max-rate", value: "100M" }
  - { option: "checkpoint", value: "fast" }
pg_probackup:
  - { option: "command", value: "pg_probackup-{{ pg_probackup_version }} restore -B {{ pg_probackup_dir }} --instance {{ pg_probackup_instance }} -j {{ pg_probackup_threads }} {{ pg_probackup_add_keys }}" }
  - { option: "no_params", value: "true" }

# "restore_command" written to recovery.conf when configuring follower (create replica)
postgresql_restore_command: ""
# postgresql_restore_command: "wal-g wal-fetch %f %p"  # restore WAL-s using WAL-G
# postgresql_restore_command: "pgbackrest --stanza={{ pgbackrest_stanza }} archive-get %f %p"  # restore WAL-s using pgbackrest
# postgresql_restore_command: "pg_probackup-{{ pg_probackup_version }} archive-get -B {{ pg_probackup_dir }} --instance {{ pg_probackup_instance }} --wal-file-path=%p --wal-file-name=%f"  # restore WAL-s using pg_probackup

# pg_probackup
pg_probackup_install: false  # or 'true'
pg_probackup_install_from_postgrespro_repo: true  # or 'false'
pg_probackup_version: "{{ postgresql_version }}"
pg_probackup_instance: "pg_probackup_instance_name"
pg_probackup_dir: "/mnt/backup_dir"
pg_probackup_threads: "4"
pg_probackup_add_keys: "--recovery-target=latest --skip-external-dirs --no-validate"
pg_probackup_patroni_cluster_bootstrap_command: "pg_probackup-{{ pg_probackup_version }} restore -B {{ pg_probackup_dir }} --instance {{ pg_probackup_instance }} -j {{ pg_probackup_threads }} {{ pg_probackup_add_keys }}"

# WAL-G
wal_g_install: true  # or 'true'
wal_g_version: "2.0.1"
wal_g_json:  # config https://github.com/wal-g/wal-g#configuration
  - { option: "AWS_ACCESS_KEY_ID", value: "****" }
  - { option: "AWS_SECRET_ACCESS_KEY", value: "****" }
  # - { option: "AWS_ENDPOINT", value: "http://172.26.9.200:9000" }
  - { option: "WALG_S3_PREFIX", value: "s3://soa-backup/soabackup/" }
  - { option: "AWS_S3_FORCE_PATH_STYLE", value: "true" }
  - { option: "WALG_COMPRESSION_METHOD", value: "gzip" } #"lz4", "zstd", "brotli", "gzip",
  - { option: "PGDATA", value: "{{ postgresql_data_dir }}" }
  - { option: "PGHOST", value: "{{ postgresql_unix_socket_dir }}" }
#  - { option: "AWS_REGION", value: "us-east-1" }
#  - { option: "WALG_S3_CA_CERT_FILE", value: "/path/to/custom/ca/file" }
#  - { option: "", value: "" }
wal_g_patroni_cluster_bootstrap_command: "wal-g backup-fetch {{ postgresql_data_dir }} LATEST"

# pgBackRest
pgbackrest_install: false  # or 'true'
pgbackrest_install_from_pgdg_repo: true  # or 'false'
pgbackrest_stanza: "{{ patroni_cluster_name }}"  # specify your --stanza
pgbackrest_repo_type: "posix"  # or "s3", "gcs", "azure"
pgbackrest_repo_host: ""  # dedicated repository host (optional)
pgbackrest_repo_user: "postgres"
pgbackrest_conf_file: "/etc/pgbackrest/pgbackrest.conf"
# see more options https://pgbackrest.org/configuration.html
pgbackrest_conf:
  global:  # [global] section
    - { option: "log-level-file", value: "detail" }
    - { option: "log-path", value: "/var/log/pgbackrest" }
#    - { option: "repo1-host", value: "{{ pgbackrest_repo_host }}" }
#    - { option: "repo1-host-user", value: "{{ pgbackrest_repo_user }}" }
    - { option: "repo1-type", value: "{{ pgbackrest_repo_type |lower }}" }
    - { option: "repo1-path", value: "/var/lib/pgbackrest" }
    - { option: "repo1-retention-full", value: "4" }
    - { option: "repo1-retention-archive", value: "4" }
    - { option: "start-fast", value: "y" }
    - { option: "stop-auto", value: "y" }
    - { option: "resume", value: "n" }
    - { option: "link-all", value: "y" }
#    - { option: "", value: "" }
  stanza:  # [stanza_name] section
    - { option: "pg1-path", value: "{{ postgresql_data_dir }}" }
    - { option: "recovery-option", value: "recovery_target_action=promote" }
    - { option: "log-level-console", value: "info" }
    - { option: "process-max", value: "4" }
#    - { option: "", value: "" }
# (optional) dedicated backup server config (if "repo_host" is set)
pgbackrest_server_conf:
  global:
    - { option: "log-level-file", value: "detail" }
    - { option: "log-level-console", value: "info" }
    - { option: "log-path", value: "/var/log/pgbackrest" }
    - { option: "repo1-type", value: "{{ pgbackrest_repo_type |lower }}" }
    - { option: "repo1-path", value: "/var/lib/pgbackrest" }
    - { option: "repo1-retention-full", value: "4" }
    - { option: "repo1-retention-archive", value: "4" }
    - { option: "repo1-bundle", value: "y" }
    - { option: "repo1-block", value: "y" }
    - { option: "start-fast", value: "y" }
    - { option: "stop-auto", value: "y" }
    - { option: "resume", value: "n" }
    - { option: "link-all", value: "y" }
    - { option: "archive-check", value: "y" }
    - { option: "archive-copy", value: "n" }
    - { option: "backup-standby", value: "y" }
#    - { option: "", value: "" }
# the stanza section will be generated automatically

pgbackrest_patroni_cluster_restore_command:
  '/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} --delta restore'  # restore from latest backup
#  '/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} --type=time "--target=2020-06-01 11:00:00+03" --delta restore'  # Point-in-Time Recovery (example)

# By default, the cron jobs is created on the database server.
# If 'repo_host' is defined, the cron jobs will be created on the pgbackrest server.
pgbackrest_cron_jobs:
  - name: "pgBackRest: Full Backup"
    file: /etc/cron.d/pgbackrest
    user: "postgres"
    minute: "30"
    hour: "6"
    day: "*"
    month: "*"
    weekday: "0"
    job: "pgbackrest --type=full --stanza={{ pgbackrest_stanza }} backup"
    # job: "if [ $(psql -tAXc 'select pg_is_in_recovery()') = 'f' ]; then pgbackrest --type=full --stanza={{ pgbackrest_stanza }} backup; fi"
  - name: "pgBackRest: Diff Backup"
    file: /etc/cron.d/pgbackrest
    user: "postgres"
    minute: "30"
    hour: "6"
    day: "*"
    month: "*"
    weekday: "1-6"
    job: "pgbackrest --type=diff --stanza={{ pgbackrest_stanza }} backup"
    # job: "if [ $(psql -tAXc 'select pg_is_in_recovery()') = 'f' ]; then pgbackrest --type=diff --stanza={{ pgbackrest_stanza }} backup; fi"

cron_jobs: 
# Example for walg
 - name: "WAL-G: Create daily backup"
   user: "postgres"
   file: /etc/cron.d/walg
   minute: "30"
   hour: "22"
   day: "*"
   month: "*"
   weekday: "*"
   job: "[ $(curl -s -o /dev/null -w '%{http_code}' http://{{ inventory_hostname }}:{{ patroni_restapi_port }}) = '200' ] && wal-g backup-push"

# PITR mode (if patroni_cluster_bootstrap_method: "pgbackrest" or "wal-g"):
# 1) The database cluster directory will be cleaned (for "wal-g") or overwritten (for "pgbackrest" --delta restore).
# 2) And also the patroni cluster "{{ patroni_cluster_name }}" will be removed from the DCS (if exist) before recovery.

disable_archive_command: true  # or 'false' to not disable archive_command after restore
keep_patroni_dynamic_json: true  # or 'false' to remove patroni.dynamic.json after restore (if exists)

# Netdata - https://github.com/netdata/netdata
netdata_install: false  # or 'true' for install Netdata on postgresql cluster nodes (with kickstart.sh)
netdata_install_options: "--stable-channel --disable-telemetry --dont-wait"
netdata_conf:
  web_bind_to: "*"
  # https://learn.netdata.cloud/docs/store/change-metrics-storage
  memory_mode: "dbengine"  # The long-term metrics storage with efficient RAM and disk usage.
  page_cache_size: 64  # Determines the amount of RAM in MiB that is dedicated to caching Netdata metric values.
  dbengine_disk_space: 1024  # Determines the amount of disk space in MiB that is dedicated to storing Netdata metric values.
vitabaks commented 1 year ago

I can assume that you don't have a backup copy of wal-g yet and since you commented out basebackup, the replica can't be initiated.
Check for a wal-g backup, check patroni logs on replicas.

I also recommend leaving basebackup second in the list, which means that if the first method was not successful (for example, there is no backup or access to the backup storage), the second method will be used

example:

patroni_create_replica_methods:
 - wal_g
 - basebackup
fatmaAliGamal commented 1 year ago

okey , i will try

fatmaAliGamal commented 1 year ago

thanks done