zalando / spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Apache License 2.0
1.51k stars 371 forks source link

Failed to get list of machines using spilo, etcd on swarm #995

Closed jocafi closed 4 weeks ago

jocafi commented 1 month ago

@CyberDem0n thanks for the amazing development and help of developers.

I am creating a pg cluster using etcd and spilo and deploying using docker swarm, but I am getting the error below in the spilo image:

2024-05-25 14:10:15,696 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2024-05-25 14:10:15,704 - bootstrapping - INFO - No meta-data available for this provider
2024-05-25 14:10:15,705 - bootstrapping - INFO - Looks like you are running unsupported
2024-05-25 14:10:15,736 - bootstrapping - INFO - Configuring wal-e
2024-05-25 14:10:15,736 - bootstrapping - INFO - Configuring certificate
2024-05-25 14:10:15,736 - bootstrapping - INFO - Generating ssl self-signed certificate
2024-05-25 14:10:15,801 - bootstrapping - INFO - Configuring pam-oauth2
2024-05-25 14:10:15,801 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2024-05-25 14:10:15,801 - bootstrapping - INFO - Configuring pgbouncer
2024-05-25 14:10:15,801 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2024-05-25 14:10:15,801 - bootstrapping - INFO - Configuring standby-cluster
2024-05-25 14:10:15,802 - bootstrapping - INFO - Configuring log
2024-05-25 14:10:15,802 - bootstrapping - INFO - Configuring patroni
2024-05-25 14:10:15,810 - bootstrapping - INFO - Writing to file /run/postgres.yml
2024-05-25 14:10:15,810 - bootstrapping - INFO - Configuring bootstrap
2024-05-25 14:10:15,810 - bootstrapping - INFO - Configuring crontab
2024-05-25 14:10:15,811 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2024-05-25 14:10:15,811 - bootstrapping - INFO - Configuring pgqd
2024-05-25 13:01:46,035 ERROR: Failed to get list of machines from http://etcd1:2379/v2: EtcdException('Bad response : 404 page not found\n')
2024-05-25 13:01:46,038 ERROR: Failed to get list of machines from http://etcd2:2379/v2: EtcdException('Bad response : 404 page not found\n')
2024-05-25 13:01:46,038 INFO: waiting on etcd
2024-05-25 13:01:51,044 ERROR: Failed to get list of machines from http://etcd1:2379/v2: EtcdException('Bad response : 404 page not found\n')
2024-05-25 13:01:51,046 ERROR: Failed to get list of machines from http://etcd2:2379/v2: EtcdException('Bad response : 404 page not found

Here is my configuration:

Docker Network: docker network create --driver overlay net-postgres

Docker compose:

services:
  etcd1:
    image: docker.io/bitnami/etcd:3.5.13
    hostname: 'etcd1'
    networks:
      - net-postgres
    environment:
      #https://github.com/bitnami/containers/tree/main/bitnami/etcd
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_NAME=etcd1
      - ETCDCTL_API=2
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd1:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd1:2379
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
      - ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380
      - ETCD_INITIAL_CLUSTER_STATE=new
    deploy:
      replicas: 1
      placement:
        constraints:
          - "node.role==manager"

  etcd2:
    image: docker.io/bitnami/etcd:3.5.13
    hostname: 'etcd2'
    networks:
      - net-postgres
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_NAME=etcd2
      - ETCDCTL_API=2
      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd2:2380
      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
      - ETCD_ADVERTISE_CLIENT_URLS=http://etcd2:2379
      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
      - ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380
      - ETCD_INITIAL_CLUSTER_STATE=new
    deploy:
      replicas: 1
      placement:
        constraints:
          - "node.role!=manager"

  postgres-node1:
    image: 127.0.0.1:5000/spilo:3.0-p1
    hostname: 'postgres-node1'
    environment:
      # https://github.com/zalando/spilo/blob/master/ENVIRONMENT.rst
      ETCD_HOSTS: '"etcd1:2379","etcd2:2379"'
      PGPASSWORD_STANDBY: 'secret'
      PGPASSWORD_ADMIN: 'secret'
      PGPASSWORD_SUPERUSER: 'secret'
      SCOPE: pgCluster
    depends_on:
      - etcd1
      - etcd2
    networks:
      - net-postgres
    volumes:
      - /root/data:/root/pgdata
    deploy:
      replicas: 1
      placement:
        constraints:
          - "node.role==manager"

  postgres-node2:
    image: 127.0.0.1:5000/spilo:3.0-p1
    hostname: 'postgres-node2'
    environment:
      ETCD_HOSTS: '"etcd1:2379","etcd2:2379"'
      PGPASSWORD_STANDBY: 'secret'
      PGPASSWORD_ADMIN: 'secret'
      PGPASSWORD_SUPERUSER: 'secret'
      SCOPE: pgCluster
    depends_on:
      - etcd1
      - etcd2
    networks:
      - net-postgres
    volumes:
      - /root/data:/root/pgdata
    deploy:
      replicas: 1
      placement:
        constraints:
          - "node.role!=manager"

networks:
  net-postgres:
    external: true

If I run patronictl list inside the spilo image I get also the same error as above:

ERROR: Failed to get list of machines from http://etcd1:2379/v2: EtcdException('Bad response : 404 page not found
ERROR: Failed to get list of machines from http://etcd2:2379/v2: EtcdException('Bad response : 404 page not found

What do you suggest to fix this issue?

shikharvashistha commented 4 weeks ago

why /v2 at the end of the url?

do bitnami/ectd have that endpoint defined? because it simply says it doesn't exists i.e. 404

jocafi commented 4 weeks ago

Thanks, @shikharvashistha . I could reproduce the error now. I opened an issue at Bitnami: https://github.com/bitnami/containers/issues/67840

I will close this ticket.

shikharvashistha commented 4 weeks ago

Thanks, @shikharvashistha . I could reproduce the error now. I opened an issue at Bitnami: https://github.com/bitnami/containers/issues/67840

I will close this ticket.

No need to open it there, use spilo/patroni ectd image either.