patroni / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
MIT License
6.83k stars 846 forks source link

The problem with etcd version #3211

Closed wxmeng04 closed 3 hours ago

wxmeng04 commented 4 hours ago

What happened?

When patroni /etc/patroni/config.yml was run, the command line has below prompt:

2024-11-15 17:53:48,464 ERROR: Failed to get list of machines from http://etcd3.local:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='etcd3.local', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")

I checked the latest documentation of etcd version 3.5 via below link: https://etcd.io/docs/v3.3/dev-guide/api_grpc_gateway/#notes The latest docs says only [CLIENT-URL]/v3/* will be used.

gRPC gateway endpoint has changed since etcd v3.3:

etcd v3.2 or before uses only [CLIENT-URL]/v3alpha/*.
etcd v3.3 uses [CLIENT-URL]/v3beta/* while keeping [CLIENT-URL]/v3alpha/*.
etcd v3.4 uses [CLIENT-URL]/v3/* while keeping [CLIENT-URL]/v3beta/*.
[CLIENT-URL]/v3alpha/* is deprecated.
etcd v3.5 or later uses only [CLIENT-URL]/v3/*.
[CLIENT-URL]/v3beta/* is deprecated.

I'm not sure if I have to use a lower version of etcd. For example 3.4.

How can we reproduce it (as minimally and precisely as possible)?

patroni /etc/patroni/config.yml Just run the command as above with the same version etcd and patroni.

What did you expect to happen?

Expect no error

Patroni/PostgreSQL/DCS version

Patroni configuration file

scope: pg17_patroni
namespace: /postgresql/
name: pg17_patroni01

restapi:
  listen: pgsql1.local:8008
  connect_address: pgsql1.local:8008
#  cafile: /etc/ssl/certs/ssl-cacert-snakeoil.pem
#  certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#  keyfile: /etc/ssl/private/ssl-cert-snakeoil.key
#  authentication:
#    username: username
#    password: password

#ctl:
#  insecure: false # Allow connections to Patroni REST API without verifying certificates
#  certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#  keyfile: /etc/ssl/private/ssl-cert-snakeoil.key
#  cacert: /etc/ssl/certs/ssl-cacert-snakeoil.pem

etcd3:
  hosts:
  - etcd1.local:2379
  - etcd2.local:2379
  - etcd3.local:2379
  username: root
  password: JP35SHH62R41QK5P
  cacert: /etc/patroni/etcd-ca.crt
  cert: /etc/patroni/etcd.crt
  key: /etc/patroni/etcd.key

# The bootstrap configuration. Works only when the cluster is not yet initialized.
# If the cluster is already initialized, all changes in the `bootstrap` section are ignored!
bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    primary_start_timeout: 300
    synchronous_mode: false
    #standby_cluster:
      #host: 127.0.0.1
      #port: 1111
      #primary_slot_name: patroni
    postgresql:
      use_pg_rewind: true
      pg_hba:
      - host replication replicator 127.0.0.1/32 md5
      - host all all 0.0.0.0/0 md5
#      use_slots: true
      parameters:
#        wal_level: hot_standby
#        hot_standby: "on"
#        max_connections: 100
#        max_worker_processes: 8
#        wal_keep_segments: 8
#        max_wal_senders: 10
#        max_replication_slots: 10
#        max_prepared_transactions: 0
#        max_locks_per_transaction: 64
#        wal_log_hints: "on"
#        track_commit_timestamp: "off"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p

  initdb:
  - encoding: UTF8
  - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.0.7:5432
  data_dir: /var/lib/postgresql/17/main
  bin_dir: /usr/lib/postgresql/17/bin
  config_dir: /etc/postgresql/17/main
  pgpass: /var/lib/postgresql/.pgpass
  authentication:
    replication:
      username: replica
      password: 1234
    superuser:
      username: postgres
      password: X9FV73H7BUJYX7Z2
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password
  parameters:
    listen_addresses: 0.0.0.0
    port: 5432
    max_connections: 3000
    superuser_reserved_connections: 100
    max_locks_per_transaction: 64
    max_worker_processes: 2
    max_prepared_transactions: 0
    wal_level: logical
    wal_log_hints: on
    track_commit_timestamp: off
    max_wal_senders: 10
    max_replication_slots: 10
    hot_standby: "on"
    cluster_name: "pg_cluster"
    archive_mode: on
    archive_command: "cp %p /var/lib/postgresql/17/main/backups/%f"
    synchronous_commit: off
    shared_buffers:  1495MB
    maintenance_work_mem: 512MB
    max_stack_depth: 7372kB
    vacuum_cost_delay: 10ms
    bgwriter_delay: 10ms
    wal_buffers: 16384kB
    effective_cache_size: 3829540kB
    log_destination: csvlog
    logging_collector: on
    log_checkpoints: on
    log_connections: on
    log_disconnections: on
    log_lock_waits: on
    log_timezone: 'Asia/Shanghai'
    log_statement: ddl
    log_autovacuum_min_duration: 0
    timezone: 'Asia/Shanghai'

tags:
    # failover_priority: 1
    noloadbalance: false
    clonefrom: false
    nosync: false
    nostream: false

patronictl show-config

patronictl  -c /etc/patroni/config.yml show-config pg17-patroni
2024-11-15 18:02:53,324 - ERROR - Failed to get list of machines from http://etcd1.local:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='etcd1.local', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-11-15 18:02:53,325 - ERROR - Failed to get list of machines from http://etcd2.local:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='etcd2.local', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")
2024-11-15 18:02:53,327 - ERROR - Failed to get list of machines from http://etcd3.local:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='etcd3.local', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")

Patroni log files

2024-11-15 17:03:52,045 ERROR: Failed to get list of machines from http://192.168.0.11:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='192.168.0.11', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-11-15 17:03:52,046 ERROR: Failed to get list of machines from http://192.168.0.13:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='192.168.0.13', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")
2024-11-15 17:03:57,048 ERROR: Failed to get list of machines from http://192.168.0.12:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='192.168.0.12', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")
2024-11-15 17:03:57,049 ERROR: Failed to get list of machines from http://192.168.0.11:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='192.168.0.11', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-11-15 17:03:57,051 ERROR: Failed to get list of machines from http://192.168.0.13:2379/v3beta: MaxRetryError("HTTPConnectionPool(host='192.168.0.13', port=2379): Max retries exceeded with url: /version (Caused by ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))")

PostgreSQL log files

no started, so no log file

Have you tried to use GitHub issue search?

Anything else we need to know?

No response

CyberDem0n commented 3 hours ago

I am pretty confident that Patroni successfully works with etcd 3.5. The errors you get are not related to endpoint names. If etcd doesn't like the endpoint it explicitly saying about it and sets the respective HTTP code (e.g. 404). In your case errors are totally different, Patroni simply can't connect to etcd:

I would advise you to check that etcd hosts are accepting TCP connections on the port 2379.