jakubhajek / traefik-consul-swarm

Example code how to deploy Traefik with Consul as the provider
21 stars 10 forks source link

Could not get this to work on eth1 #1

Open platomaniac opened 5 years ago

platomaniac commented 5 years ago

I am facing a problem when I adapted the stack file just a bit.

I consistently see that the consul replicas can not connect to the consul leader. What did I do wrong here? Is it because of the UFW firewall but I really doubt that. Any ideas will be really appreciated.

Traefik error

Traefik service gives this error:

level=error msg="Load config error: Get http://consul-leader:8500/v1/kv/traefik?consistent=&recurse=&wait=30000ms: dial tcp 10.0.2.12:8500: i/o timeout, retrying in 727.936874ms"

My stack file (adapted from this repo)

# Define which version of docker-compose file you are using.
version: '3.7'

# Traefik Docker Stack for Docker Swarm with Consul
services:
  consul-leader:
    image: consul:1.5
    command: agent -server -client=0.0.0.0 -bootstrap -ui
    volumes:
      - consul-data-leader:/consul/data
    environment:
      - CONSUL_BIND_INTERFACE=eth1
      - 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}'
    networks:
      - traefik-consul
    deploy:
      labels:
        - traefik.frontend.rule=Host:consul.${DOMAIN}
        - traefik.backend=consul-leader
        - traefik.enable=true
        - traefik.port=8500
        - traefik.docker.network=proxy-net
        - traefik.frontend.auth.basic.users=${USERNAME}:${HASHED_PASSWORD}

  consul-replica:
    image: consul:1.5
    command: agent -server -client=0.0.0.0 -retry-join="consul-leader"
    volumes:
      - consul-data-replica:/consul/data
    environment:
      - CONSUL_BIND_INTERFACE=eth1
      - 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}'
    networks:
      - traefik-consul
    deploy:
      replicas: ${CONSUL_REPLICAS:-2}
      placement:
        constraints:
          - node.role == manager # Data ceneter, 'spread=node.labels.datacenter, datacenter=us-west,
  # Traefik Reverse-Proxy Service
  traefik:
    image: traefik:v1.7
    networks:
      - traefik-consul
      - traefik-proxy-network
    ports:
      - target: 80
        published: 80
        mode: host
      - target: 443
        published: 443
        mode: host
    # These options are only used in Docker Swarm via docker stack deploy.
    deploy:
      mode: global
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.role == manager
      labels:
        - traefik.frontend.rule=Host:monitor.${DOMAIN}
        - traefik.enable=true
        - traefik.port=8080
        - traefik.docker.network=proxy-net
        - traefik.frontend.auth.basic.users=${USERNAME}:${HASHED_PASSWORD}
        - traefik.frontend.headers.forceSTSHeader=true
        - traefik.frontend.headers.STSSeconds=315360000
        - traefik.frontend.headers.STSIncludeSubdomains=true
        - traefik.frontend.headers.STSPreload=true
        - traefik.backend.loadbalancer.swarm=true
        - traefik.backend.loadbalancer.method=drr

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

    command: >
      --docker
      --docker.swarmmode
      --docker.watch
      --docker.exposedbydefault=false
      --api
      --consul
      --consul.endpoint=consul-leader:8500
      --logLevel=INFO
      --accessLog
      --docker.swarmModeRefreshSeconds=5
      --entrypoints=Name:http Address::80 Redirect.EntryPoint:https
      --entrypoints=Name:https Address::443 TLS
      --acme
      --acme.email=${EMAIL}
      --acme.storage=traefik/acme/account
      --acme.entrypoint=https
      --acme.tlsChallenge=true
      --acme.onhostrule=true
      --acme.acmelogging=true

# Traefik Docker Networks for Docker Swarm
networks:
  traefik-proxy-network:
    name: proxy-net
    attachable: true
  traefik-consul:
    driver: overlay
    attachable: true
    name: traefik-consul
    driver_opts:
      encrypted: "true"

volumes:
  consul-data-leader:
  consul-data-replica:

Consul log (Leader)

Here is consul log:

traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | ==> Found address '172.18.0.3' for interface 'eth1', setting bind option...
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | bootstrap = true: do not enable unless necessary
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | ==> Starting Consul agent...
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |            Version: 'v1.5.3'
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |            Node ID: 'd59afb82-d9af-425f-9380-cac26e5dac09'
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |          Node name: '8c1a5cc7ed3e'
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |         Datacenter: 'dc1' (Segment: '<all>')
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |             Server: true (Bootstrap: true)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |       Cluster Addr: 172.18.0.3 (LAN: 8301, WAN: 8302)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |            Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | 
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | ==> Log data will now stream in as it occurs:
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | 
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO]  raft: Initial configuration (index=1): [{Suffrage:Voter ID:d59afb82-d9af-425f-9380-cac26e5dac09 Address:172.18.0.3:8300}]
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] serf: EventMemberJoin: 8c1a5cc7ed3e.dc1 172.18.0.3
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] serf: EventMemberJoin: 8c1a5cc7ed3e 172.18.0.3
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO]  raft: Node at 172.18.0.3:8300 [Follower] entering Follower state (Leader: "")
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] consul: Handled member-join event for server "8c1a5cc7ed3e.dc1" in area "wan"
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] consul: Adding LAN server 8c1a5cc7ed3e (Addr: tcp/172.18.0.3:8300) (DC: dc1)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:21 [INFO] agent: started state syncer
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    | ==> Consul agent running!
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [ERR] agent: failed to sync remote state: No cluster leader
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [WARN]  raft: Heartbeat timeout from "" reached, starting election
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [INFO]  raft: Node at 172.18.0.3:8300 [Candidate] entering Candidate state in term 2
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [INFO]  raft: Election won. Tally: 1
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [INFO]  raft: Node at 172.18.0.3:8300 [Leader] entering Leader state
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [INFO] consul: cluster leadership acquired
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [INFO] consul: New leader elected: 8c1a5cc7ed3e
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:28 [INFO] consul: member '8c1a5cc7ed3e' joined, marking health alive
traefik_consul-leader.1.8g0muy3wmzm8@zabo-dev-worker-02    |     2019/07/27 18:24:30 [INFO] agent: Synced node info

Consul Replica log

traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | ==> Found address '172.18.0.4' for interface 'eth1', setting bind option...
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | ==> Starting Consul agent...
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |            Version: 'v1.5.3'
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |            Node ID: '92fd8f94-853c-b916-69a9-ccefb76128c3'
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |          Node name: '7476db8ebed0'
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |         Datacenter: 'dc1' (Segment: '<all>')
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |             Server: true (Bootstrap: false)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |       Cluster Addr: 172.18.0.4 (LAN: 8301, WAN: 8302)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |            Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | 
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | ==> Log data will now stream in as it occurs:
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | 
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO]  raft: Initial configuration (index=0): []
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] serf: EventMemberJoin: 7476db8ebed0.dc1 172.18.0.4
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] serf: EventMemberJoin: 7476db8ebed0 172.18.0.4
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO]  raft: Node at 172.18.0.4:8300 [Follower] entering Follower state (Leader: "")
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] consul: Adding LAN server 7476db8ebed0 (Addr: tcp/172.18.0.4:8300) (DC: dc1)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] consul: Handled member-join event for server "7476db8ebed0.dc1" in area "wan"
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: started state syncer
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | ==> Consul agent running!
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s mdns os packet scaleway softlayer triton vsphere
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: Joining LAN cluster...
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [INFO] agent: (LAN) joining: [consul-leader]
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [WARN] memberlist: Failed to resolve consul-leader: lookup consul-leader on 127.0.0.11:53: no such host
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [WARN] agent: (LAN) couldn't join: 0 Err: 1 error occurred:
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |  * Failed to resolve consul-leader: lookup consul-leader on 127.0.0.11:53: no such host
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    | 
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:18 [WARN] agent: Join LAN failed: <nil>, retrying in 30s
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:25 [ERR] agent: failed to sync remote state: No cluster leader
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:26 [WARN]  raft: no known peers, aborting election
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:48 [INFO] agent: (LAN) joining: [consul-leader]
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:48 [ERR] agent: Coordinate update error: No cluster leader
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:49 [ERR] agent: failed to sync remote state: No cluster leader
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |     2019/07/27 18:24:58 [WARN] agent: (LAN) couldn't join: 0 Err: 1 error occurred:
traefik_consul-replica.1.l7u4f5fn03hu@zabo-dev-03    |  * Failed to join 10.0.2.12: dial tcp 10.0.2.12:8301: i/o timeout
jakubhajek commented 5 years ago

hello @platomaniac thank you for giving a try of using my stack files. Seems that it can be any network connectivity problem because I noticed timeout in your attached logs.

* Failed to join 10.0.2.12: dial tcp 10.0.2.12:8301: i/o timeout

So maybe that it is the case why cluster can be created. I would suggest disabling UFW and than starting from scratch once again. Make sure that your volumes are created also from scratch.

However, I've updated the repo with the latest stack files where consul is running as a separate stack.

You should expect the following output:

/ # consul members
Node          Address          Status  Type    Build  Protocol  DC   Segment
3fc5fa01fde4  10.0.38.10:8301  alive   server  1.5.3  2         dc1  <all>
483b7a6c600d  10.0.38.3:8301   alive   server  1.5.3  2         dc1  <all>
8043b85f0081  10.0.38.4:8301   alive   server  1.5.3  2         dc1  <all>
f8cc0f5388bb  10.0.38.5:8301   alive   server  1.5.3  2         dc1  <all>