patroni / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
MIT License
6.78k stars 843 forks source link

Patroni 4.0.1 physical replication slots auto-creates on slave nodes #3145

Closed Medzon888 closed 1 month ago

Medzon888 commented 1 month ago

What happened?

Hello

We have a typical patroni installation without static slots. When we install cluster we got an issue where physical slots creates on slave side. Of course they is in false state. I didnt saw in postgres log file entries about creating this slots but they exists.

patroni list:


+ Cluster: main (743635804769961459) -----------------------------------------------------------------------+--------------+-----------+----+-----------+
| Member                         | Host                                                                      | Role         | State     | TL | Lag in MB |
+--------------------------------+---------------------------------------------------------------------------+--------------+-----------+----+-----------+
| p-server-patroni-05-db-01 | p-server-patroni-05-db-01.fqdn:5435                   | Replica      | streaming |  1 |         0 |
| p-server-patroni-05-db-02 | p-server-patroni-05-db-02.fqdn:5435                  | Leader       | running   |  1 |           |
| p-server-patroni-05-db-03 | p-server-patroni-05-db-03.fqdn:5435                  | Sync Standby | streaming |  1 |         0 |
+--------------------------------+---------------------------------------------------------------------------+--------------+-----------+----+-----------+

select * from pg_replication_slots on slave side (p-server-patroni-05-db-03)


-[ RECORD 1 ]-------+-------------------------------
slot_name           | p_server_patroni_05_db_02
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | f
active_pid          |
xmin                |
catalog_xmin        |
restart_lsn         | 6/1A001490
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       | 115544686592
two_phase           | f
conflicting         |
-[ RECORD 2 ]-------+-------------------------------
slot_name           | p_server_patroni_05_db_01
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | f
active_pid          |
xmin                |
catalog_xmin        |
restart_lsn         | 6/1A001490
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       | 115544686592
two_phase           | f
conflicting         |

How can we reproduce it (as minimally and precisely as possible)?

Create patroni cluster with same or default configuration

What did you expect to happen?

Need a fix to prevent creating physical slots on slave side

Patroni/PostgreSQL/DCS version

Patroni configuration file

scope: main
namespace: /service/
name: p-server-patroni-05-db-03

log:
  level: DEBUG
  format: '%(asctime)s %(levelname)s: %(message)s'
  max_queue_size: 1000
  dir: /datadir/pg_data/logs/patroni
  file_num: 4
  file_size: 25000000
  loggers:
    patroni.postmaster: DEBUG
    urllib3: DEBUG

restapi:
  connect_address: p-server-patroni-05-db-03.fqdn:8008
  listen: 0.0.0.0:8008
  authentication:
    username: patroni
    password: password

etcd3:
  hosts: 10.10.10.10:2379,10.10.10.10:2379,10.10.10.10:2379
  username: root
  password: password

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: true
    synchronous_mode_strict: false
    check_timeline: true
    postgresql:
      use_pg_rewind: false
      use_slots: true
      parameters:
        max_connections: 300
        max_locks_per_transaction: 64
        max_worker_processes: 8
        max_prepared_transactions: 0
        wal_level: logical
        wal_log_hints: on
        track_commit_timestamp: off
        max_wal_senders: 10
        max_replication_slots: 10
  initdb:
    - encoding: UTF8
    - data-checksums
    - waldir: /datadir/pg_wal/postgres
  users:
    postgres:
      password: password
    replica:
      password: password
      options:
        - replication

postgresql:
  use_unix_socket: true
  listen: 0.0.0.0:5435
  connect_address: p-server-patroni-05-db-03.fqdn:5435
  data_dir: /datadir/pg_data/postgres
  bin_dir: /usr/lib/postgresql/16/bin
  config_dir: /etc/postgresql/16/main
  pgpass: /var/lib/postgresql/.pgpass
  authentication:
    superuser:
      username: postgres
      password: password
    replication:
      username: replica
      password: password
  create_replica_methods:
    - basebackup
  basebackup:
    waldir: /datadir/pg_wal/postgres
  parameters:
    unix_socket_directories: /var/run/postgresql
    shared_preload_libraries: pg_stat_statements,auto_explain
    archive_mode: on
    archive_command: wal-g wal-push %p
  pg_hba:
    - local all all trust
    - host all all 127.0.0.1/32 trust
    - host all all 0.0.0.0/0 scram-sha-256
    - host replication replica 127.0.0.1/32 scram-sha-256
    - host replication replica 10.10.10.10/32 scram-sha-256
    - host replication replica 10.10.10.10/32 scram-sha-256
    - host replication replica 10.10.10.10/32 scram-sha-256
  pg_ctl_timeout: 60

watchdog:
  mode: automatic
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

check_timeline: true
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    max_connections: 300
    max_locks_per_transaction: 64
    max_prepared_transactions: 0
    max_replication_slots: 10
    max_wal_senders: 10
    max_worker_processes: 8
    track_commit_timestamp: false
    wal_level: logical
    wal_log_hints: true
  use_pg_rewind: false
  use_slots: true
retry_timeout: 10
synchronous_mode: true
synchronous_mode_strict: false
ttl: 30

Patroni log files

nothing special

PostgreSQL log files

There is all what we have in our log files since cluster creating for the key words (slot names)

$ egrep "p_server_patroni_05_db_02|p_server_patroni_05_db_01" ./*
./postgresql.log-2024-09-02_1800:2024-09-02 16:23:31.164 MSK [14112] [0] [2/0]: user=postgres,db=postgres,app=Patroni heartbeat,client=[local] LOG:  duration: 25.315 ms  statement: SELECT pg_catalog.pg_create_physical_replication_slot('p_server_patroni_05_db_01', true) WHERE NOT EXISTS (SELECT 1 FROM pg_catalog.pg_replication_slots WHERE slot_type = 'physical' AND slot_name = 'p_server_patroni_05_db_01')


### Have you tried to use GitHub issue search?

- [X] Yes

### Anything else we need to know?

_No response_
rafaelthca commented 1 month ago

@Medzon888 this is not a bug. Patroni maintains those slots. Slots are inactive but Patroni makes sure to advance their restart_lsn so that WAL is not accumulated.

CyberDem0n commented 1 month ago

Yes, it is not a bug, but feature. If it worries you, you can disable this behavior by setting member_slots_ttl to 0.

Medzon888 commented 1 month ago

Thanks for the quick response! Didn't found that feature in documentation =(