patroni / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
MIT License
6.78k stars 843 forks source link

patroni configuration validate failed with error '[Errno -8] Unrecognized service' #3144

Closed kviset closed 1 month ago

kviset commented 2 months ago

What happened?

When I try validate patroni configuration with python 3.12.3 i'm receive error

/ # patroni --ignore-listen-port --validate-config /etc/patroni/patronictl.yml 
restapi.listen 0.0.0.0:8008 didn't pass validation: [Errno -8] Unrecognized service
postgresql.listen *:5432 didn't pass validation: [Errno -8] Unrecognized service

How can we reproduce it (as minimally and precisely as possible)?

Reproduce steps:

Create Dockerfile

FROM postgres:15.8-alpine3.20

ENV PATRONICTL_CONFIG_FILE=/etc/patroni/patronictl.yml

# hadolint ignore=DL3018
RUN set -xe \
    && apk add --no-cache \
        musl-locales \
        python3 \
        py3-pip \
        py3-psycopg \
        py3-psycopg-c \
        py3-psutil \
    && pip install --no-cache-dir --break-system-packages "patroni[psycopg3,all]"==4.0.1 \
    && mkdir -p /etc/patroni \
    && chmod 750 /var/lib/postgresql/data
COPY patronictl.yml /etc/patroni/

Create patronictl.yml

bootstrap:
  dcs:
    loop_wait: 10
    postgresql:
      parameters:
        hot_standby: 'on'
        max_connections: 100
        max_locks_per_transaction: 64
        max_prepared_transactions: 0
        max_replication_slots: 10
        max_wal_senders: 10
        max_worker_processes: 8
        track_commit_timestamp: 'off'
        wal_keep_size: 128MB
        wal_level: replica
        wal_log_hints: 'on'
      use_pg_rewind: true
      use_slots: true
    retry_timeout: 10
    ttl: 30
consul:
  register_service: true
  url: http://127.0.0.1:8500
log:
  format: '%(asctime)s %(levelname)s: %(message)s'
  level: DEBUG
  max_queue_size: 1000
  traceback_level: ERROR
  type: plain
name: node-1
postgresql:
  authentication:
    replication:
      password: password
      username: replicator
    rewind:
      password: password
      username: rewind_user
    superuser:
      password: password
      username: postgres
  bin_dir: ''
  connect_address: 10.0.0.1:5432
  data_dir: /var/lib/postgres/data
  listen: '*:5432'
  parameters:
    password_encryption: scram-sha-256
  pg_hba:
  - host all all all scram-sha-256
  - host replication replicator all scram-sha-256
restapi:
  connect_address: 10.0.0.1:8008
  listen: 0.0.0.0:8008
scope: celery
tags:
  clonefrom: true
  failover_priority: 1
  noloadbalance: false
  nostream: false
  nosync: false

Build docker docker build . -t patroni:test

Execute docker-image docker run --rm -ti patroni:test sh

Execute patroni validation patroni --ignore-listen-port --validate-config /etc/patroni/patronictl.yml

What did you expect to happen?

validation was complited success

Patroni/PostgreSQL/DCS version

Patroni configuration file

bootstrap:
  dcs:
    loop_wait: 10
    postgresql:
      parameters:
        hot_standby: 'on'
        max_connections: 100
        max_locks_per_transaction: 64
        max_prepared_transactions: 0
        max_replication_slots: 10
        max_wal_senders: 10
        max_worker_processes: 8
        track_commit_timestamp: 'off'
        wal_keep_size: 128MB
        wal_level: replica
        wal_log_hints: 'on'
      use_pg_rewind: true
      use_slots: true
    retry_timeout: 10
    ttl: 30
consul:
  register_service: true
  url: http://127.0.0.1:8500
log:
  format: '%(asctime)s %(levelname)s: %(message)s'
  level: DEBUG
  max_queue_size: 1000
  traceback_level: ERROR
  type: plain
name: node-1
postgresql:
  authentication:
    replication:
      password: password
      username: replicator
    rewind:
      password: password
      username: rewind_user
    superuser:
      password: password
      username: postgres
  bin_dir: ''
  connect_address: 10.0.0.1:5432
  data_dir: /var/lib/postgres/data
  listen: '*:5432'
  parameters:
    password_encryption: scram-sha-256
  pg_hba:
  - host all all all scram-sha-256
  - host replication replicator all scram-sha-256
restapi:
  connect_address: 10.0.0.1:8008
  listen: 0.0.0.0:8008
scope: celery
tags:
  clonefrom: true
  failover_priority: 1
  noloadbalance: false
  nostream: false
  nosync: false

patronictl show-config

not run patroni

Patroni log files

not run patroni

PostgreSQL log files

not run patroni

Have you tried to use GitHub issue search?

Anything else we need to know?

I think problem in https://github.com/patroni/patroni/blame/v4.0.1/patroni/validator.py#L147 because with python3.12:

/ # python
Python 3.12.3 (main, Aug 23 2024, 06:10:48) [GCC 13.2.1 20240309] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> host = "0.0.0.0"
>>> port = "5432"
>>> proto = socket.getaddrinfo(host, "", 0, socket.SOCK_STREAM, 0, socket.AI_PASSIVE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.12/socket.py", line 963, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -8] Unrecognized service
>>> proto = socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM, 0, socket.AI_PASSIVE)
>>> proto
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '0.0.0.0', ('0.0.0.0', 5432))]
CyberDem0n commented 2 months ago

I think there should be None instead of empty string. What is interesting, it doesn't fail on Ubuntu 24.04, which also has python 3.12.

kviset commented 2 months ago

You're right, it works with None

/ # python
Python 3.12.3 (main, Aug 23 2024, 06:10:48) [GCC 13.2.1 20240309] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> host = "0.0.0.0"
>>> proto = socket.getaddrinfo(host, None, 0, socket.SOCK_STREAM, 0, socket.AI_PASSIVE)
>>> proto
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '0.0.0.0', ('0.0.0.0', 0))]
kviset commented 2 months ago

I tried the image postgres:15.8-alpine3.19 with python3.11 and got the same error.

I think the point is that alpine uses musl instead of glibc.