vfarcic / docker-flow-proxy

Docker Flow Proxy
http://proxy.dockerflow.com/
599 stars 17 forks source link

Proxy cannot resolve swarm-listener correctly on docker-compose-stack.yml #209

Closed dmelo closed 7 years ago

dmelo commented 7 years ago

I have a swarm with tree managers, tried to use docker-compose-stack.yml, but it seems that the proxy container is not resolving swarm-listener. I have tried on a docker swarm on AWS and another swarm on my laptop. Both have the same problem.

Here is my docker version:

[dmelo@vblpso-nvirginia-00 ~]$ docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:43 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:43 2017
 OS/Arch:      linux/amd64
 Experimental: false

Here are the steps to reproduce the problem:

[dmelo@vblpso-nvirginia-00 ~]$ docker network create -d overlay proxy
nl03parbeg0g9u9fmoeay9szc
[dmelo@vblpso-nvirginia-00 ~]$ git clone https://github.com/vfarcic/docker-flow-proxy.git
Cloning into 'docker-flow-proxy'...
remote: Counting objects: 3559, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 3559 (delta 0), reused 0 (delta 0), pack-reused 3552
Receiving objects: 100% (3559/3559), 2.36 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2429/2429), done.
Checking connectivity... done.
[dmelo@vblpso-nvirginia-00 ~]$ cd docker-flow-proxy/
[dmelo@vblpso-nvirginia-00 docker-flow-proxy]$ docker stack deploy --compose-file=docker-compose-stack.yml ttt
Creating service ttt_proxy
Creating service ttt_swarm-listener
[dmelo@vblpso-nvirginia-00 docker-flow-proxy]$ docker ps
CONTAINER ID        IMAGE                                                                                               COMMAND                  CREATED             STATUS              PORTS                       NAMES
017ff7cb2077        vfarcic/docker-flow-proxy@sha256:b004b9a824bcbd4d80c2ad42854c61514fc5d04ac8cf2cf7defa7933fd44b6d6   "/docker-entrypoin..."   46 seconds ago      Up 46 seconds       80/tcp, 443/tcp, 8080/tcp   ttt_proxy.1.xu7oj6699mmjivmgs3mqhs6ex
[dmelo@vblpso-nvirginia-00 docker-flow-proxy]$ docker logs 017ff7cb2077 -f
2017/04/20 14:59:45 Starting HAProxy
2017/04/20 14:59:45 Starting "Docker Flow: Proxy"
2017/04/20 14:59:50 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
2017/04/20 14:59:55 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
2017/04/20 15:00:00 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
2017/04/20 15:00:05 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
2017/04/20 15:00:10 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
2017/04/20 15:00:15 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
2017/04/20 15:00:20 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp: lookup swarm-listener on 127.0.0.11:53: no such host. Will retry in 5 seconds.
vfarcic commented 7 years ago

Can you post the docker-compose-stack.yml file?

dmelo commented 7 years ago

I'm on commit fdfb641 . Here is the file:

version: "3"

services:

  proxy:
    image: vfarcic/docker-flow-proxy
    ports:
      - 80:80
      - 443:443
    networks:
      - proxy
    environment:
      - LISTENER_ADDRESS=swarm-listener
      - MODE=swarm
    deploy:
      replicas: 2

  swarm-listener:
    image: vfarcic/docker-flow-swarm-listener
    networks:
      - proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - DF_NOTIFY_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure
      - DF_NOTIFY_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove
    deploy:
      placement:
        constraints: [node.role == manager]

networks:
  proxy:
    external: true
vfarcic commented 7 years ago

I just deployed the same stack and it worked. Can you confirm that two replicas of the proxy and one of the listener are indeed running (e.g. docker stack ps ttt)?

The error indicates that the listener cannot be reached from the proxy. Most common culprit is networking. If it's not working correctly, other services will fail to communicate as well. Can you enter into one of the proxy containers (e.g. docker exec -it [ID] sh) and drill the listener. The commands would be:

apk add --update drill
drill swarm-listener

Assuming that the listener is running, if drill does not answer, the problem is most likely in networking. In case only one of the proxy replicas is having the problem, please make sure that those commands are executed from that one.

Please post the output of the drill command.

dmelo commented 7 years ago

Is it possible that proxy resolves the dns of swarm-listener before the service is up (causing it to resolve to another IP) and then sticks with that IP even after the swarm-listener service is ready?

My DNS is resolving to IP 66.96.147.68 when it doesn't find the right IP.

When I was deploy the stack again, I saw that the service proxy was up before the swarm-listener. About the test you suggested, When I run drill swarm-listener it is able to correctly resolve the name. Even further, I'm able to curl the URL.

[root@swarm01 ~]# docker service ls
ID                  NAME                 MODE                REPLICAS            IMAGE
uxmsvk9hgq2u        ttt_swarm-listener   replicated          0/1                 vfarcic/docker-flow-swarm-listener:latest
v62e3zp1kqub        ttt_proxy            replicated          2/2                 vfarcic/docker-flow-proxy:latest
[root@swarm01 ~]# docker ps
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS                       NAMES
b28d85e26112        vfarcic/docker-flow-proxy:latest   "/docker-entrypoin..."   10 seconds ago      Up 9 seconds        80/tcp, 443/tcp, 8080/tcp   ttt_proxy.1.zgtgdn2n8ktjlfsbnh08ieky7
[root@swarm01 ~]# docker logs -f b28d85e26112
2017/04/20 20:20:28 Starting HAProxy
2017/04/20 20:20:29 Starting "Docker Flow: Proxy"
2017/04/20 20:20:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:49 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:54 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:59 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
^C
[root@swarm01 ~]# docker node ls
ID                           HOSTNAME        STATUS  AVAILABILITY  MANAGER STATUS
qhkpqwhaggbotpp1mlpojiyen    merov2          Ready   Active        Leader
t9ei33btf4tzx2v3e82rab3ff *  swarm01.merov2  Ready   Active        Reachable
[root@swarm01 ~]# docker service ls
ID                  NAME                 MODE                REPLICAS            IMAGE
uxmsvk9hgq2u        ttt_swarm-listener   replicated          1/1                 vfarcic/docker-flow-swarm-listener:latest
v62e3zp1kqub        ttt_proxy            replicated          2/2                 vfarcic/docker-flow-proxy:latest
[root@swarm01 ~]# docker logs -f b28d85e26112
2017/04/20 20:20:28 Starting HAProxy
2017/04/20 20:20:29 Starting "Docker Flow: Proxy"
2017/04/20 20:20:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:49 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:54 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:59 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:04 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:09 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:14 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:19 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:24 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:29 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:49 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:54 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:59 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:04 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:09 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:14 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:19 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:24 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:29 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.

2017/04/20 20:22:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
^C
[root@swarm01 ~]# docker exec -ti b28d85e26112 sh
/ # apk add --update drill
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/2) Installing ldns (1.6.17-r4)
(2/2) Installing drill (1.6.17-r4)
Executing busybox-1.25.1-r0.trigger
OK: 8 MiB in 17 packages
/ # drill swarm-listener
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 53613
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
;; swarm-listener.  IN  A

;; ANSWER SECTION:
swarm-listener. 600 IN  A   10.0.0.5

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 0 msec
;; SERVER: 127.0.0.11
;; WHEN: Thu Apr 20 20:24:49 2017
;; MSG SIZE  rcvd: 62
/ # apk add --update curl
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/4) Installing ca-certificates (20161130-r1)
(2/4) Installing libssh2 (1.7.0-r2)
(3/4) Installing libcurl (7.52.1-r2)
(4/4) Installing curl (7.52.1-r2)
Executing busybox-1.25.1-r0.trigger
Executing ca-certificates-20161130-r1.trigger
OK: 9 MiB in 21 packages
/ # curl http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services
[]/ # 
vfarcic commented 7 years ago

The order should not influence it (at least not from DFP/DFSL point of view). If DFSL is not up, DFP will repeat requests.

I'm confused why DFP resolves swarm-listener to 66.96.147.68 while drill clearly indicates that. There must be a networking issue but I'm not sure what the cause is.

Can you try changing the LISTENER_ADDRESS env. var. in the stack to the full name of the service (e.g. ttt_swarm-listener)? You can also remove proxy containers so that they are rescheduled with Swarm. That would discard the option that the order matters. I'm convinced that it doesn't but, since I'm out of good ideas what the problem is...

Other than that, I think I would have to take a closer look into your system. Would it be possible to have a screen-sharing session? Would Monday suit you?

rslangham commented 7 years ago

Any resolve on this, I am having the same issue. The proxy and swarm-listener DNS resolve both to some bogus IP addresses, errors in both services trying to communicate to each other.

Similar commands on my system..


$ docker service ls
time="2017-04-29T17:20:08-05:00" level=info msg="Unable to use system certificate pool: crypto/x509: system root pool is not available on Windows"
ID            NAME                  MODE        REPLICAS  IMAGE
fa67ooofab8r  proxy_proxy           replicated  2/2       vfarcic/docker-flow-proxy:latest
gueu6l3kcz40  proxy_swarm-listener  replicated  1/1       vfarcic/docker-flow-swarm-listener:latest
jnbcomqxbdmj  registry              replicated  1/1       registry:2
ptkf0bei2ugp  app_nodered           replicated  3/3       localhost:5000/noderedipaddress
ych165ntnxdk  app_githubpages       replicated  3/3       starefossen/github-pages:latest

$ docker-machine ssh ronswarm2
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___ \ __| | ___   ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 17.04.0-ce, build HEAD : c69677f - Thu Apr  6 16:26:16 UTC 2017
Docker version 17.04.0-ce, build 4845c56

docker@ronswarm2:~$ ping proxy
PING proxy (198.105.254.114): 56 data bytes
^C^C
--- proxy ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

docker@ronswarm2:~$ ping proxy_proxy
PING proxy_proxy (198.105.254.114): 56 data bytes
^C
--- proxy_proxy ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

docker@ronswarm2:~$ ping proxy_swarm-listener
PING proxy_swarm-listener (198.105.254.114): 56 data bytes
^C
--- proxy_swarm-listener ping statistics ---
8 packets transmitted, 0 packets received, 100% packet loss
docker@ronswarm2:~$ ping swarm-listener
PING swarm-listener (198.105.254.114): 56 data bytes
^C
--- swarm-listener ping statistics ---
11 packets transmitted, 0 packets received, 100% packet loss

docker@ronswarm2:~$ docker ps
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS                       NAMES
ff993d038225        vfarcic/docker-flow-proxy:latest   "/docker-entrypoin..."   12 minutes ago      Up 12 minutes       80/tcp, 443/tcp, 8080/tcp   proxy_proxy.2.kbe7q1ws4jafbu9w3pu8ja1jy
319fef62211c        starefossen/github-pages:latest    "/bin/sh -c 'jekyl..."   32 minutes ago      Up 31 minutes       4000/tcp                    app_githubpages.2.3oxr1wi1kxecnt01zht1s2uvx

docker@ronswarm2:~$ docker exec -it ff99 sh

/ # apk add --update drill
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/2) Installing ldns (1.6.17-r4)
(2/2) Installing drill (1.6.17-r4)
Executing busybox-1.25.1-r0.trigger
OK: 8 MiB in 17 packages

/ # drill swarm-listener
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 64438
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; swarm-listener.      IN      A

;; ANSWER SECTION:
swarm-listener. 10      IN      A       198.105.254.114
swarm-listener. 10      IN      A       198.105.244.114

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 16 msec
;; SERVER: 127.0.0.11
;; WHEN: Sat Apr 29 22:21:56 2017
;; MSG SIZE  rcvd: 64

/ # apk add --update curl
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/4) Installing ca-certificates (20161130-r1)
(2/4) Installing libssh2 (1.7.0-r2)
(3/4) Installing libcurl (7.52.1-r3)
(4/4) Installing curl (7.52.1-r3)
Executing busybox-1.25.1-r0.trigger
Executing ca-certificates-20161130-r1.trigger
OK: 9 MiB in 21 packages

/ # curl http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services
**(NO RESPONSE HERE)**

My docker-compose-stack.yml is the same as above.

vfarcic commented 7 years ago

@rslangham It is normal that you cannot reach a service by its name outside of an Overlay network that service is attached to.

Something is hijacking your Overlay network.

  1. When you execute drill [SERVICE_NAME], you should always receive only one IP (address of a service, not its replicas). If, on the other hand, you execute drill tasks.[SERVICE_NAME], you should see the IPs of all the tasks (replicas).
  2. IPs like 198.105.254.114 do not look like Overlay network internal addresses (unless you customized default IP ranges for the Overlay networking).

The problems you're facing do not seem to be related to DFP or DFSL but networking.

Are you using the commands from the book to create your cluster? If you do, is your AWS account "virgin" (unmodified with corporate services)?

Could you post commands or steps that I could use to reproduce the same result? If you can't, can we have a screen-sharing session so that I take a closer look at your system?

rslangham commented 7 years ago

Thanks for the quick reply. Yes, using commands from book or very similar. Though, I have stopped and started the docker machines some. I am using Windows hyper-V. They eat up all my memory if I leave them running. Great book, btw. I have been playing around with the swarm and reverse proxy for several weeks now, having other issues with proxy sometimes working and not working.

Open to screen sharing, available now.

vfarcic commented 7 years ago

It would be complicated for me to connect now. It's 1h in the morning where I leave. Need to get some rest. Can you join http://slack.devops20toolkit.com/ and ping me tomorrow?

rslangham commented 7 years ago

absolutely, big time difference :) Joined, will ping tomorrow. thanks

mbranchnl commented 7 years ago

I've had the same issue and connected into the box. dig and drill gave back the correct IP.

After checking the /etc/resolv.conf in the container i saw there was a: search domain.net configured.

After removing that it seems to work. Next thing up for me is to find out a permanent solution.

jotunskij commented 7 years ago

Just to chime in on this; I had a very similar problem where I could dig from one container to the other, but the couldn't seem to talk to each other anyway. In my case it was because there was actually two proxy overlay networks created in the swarm. They both had the same ID but had different running containers (and therefore nodes).

Our error seems more like something that's related to dockerd, but I thought I might give people having this issue another thing to check.

vfarcic commented 7 years ago

Networking is the most common problem with DFP (and other services) in Swarm. There are too many combinations and causes that might make network not work as expected.

jotunskij commented 7 years ago

We seemed to have fixed the issue on our end by delaying between network creation and service startup of the listener and proxy. We're provisioning with ansible and adding a 10 second pause between "docker network create proxy" and starting the flow-proxy containers.

vfarcic commented 7 years ago

Some time is indeed needed for the network DNSes to propagate. 10 seconds is a sure bet even though 1 second should be more than enough.

robertgartman commented 7 years ago

We've faced the same issue as described by @dmelo. It surfaced both for docker stack and docker service deployments. In our case the issue turned out to be:

Docker daemon picks up the dns-search from the node's network config and propagates this to /etc/resolver.conf of each and every container. I.e. when looking inside a docker-flow-proxy-container we see public.domain:

root:# docker exec -ti flow_proxy.1.$(docker service ps -f 'name=flow_proxy.1' -f 'desired-state=RUNNING' flow_proxy -q --no-trunc) /bin/sh
/ # cat /etc/resolv.conf
search public.domain
nameserver 127.0.0.11
options ndots:0

The root cause for the issue in our case was that we have a public wild card dns record *.public.domain resolving to our public facing proxy. During deployment we face the situation where proxy is reaching out to swarm-listener (http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services) before swarm-listener is up and running. The dns query on swarm-listener will not resolve and fallback (as it should be) is the dns-search domains. In our case swarm-listener.public.domain and due to the wildcard it resolves to our public IP. I have not bothered investigating why the Golang process then keeps this IP forever.

Solution As already discovered by @mbranchnl, the solution is related to the search instruction in /etc/resolv.conf. We changed the docker daemon's /etc/docker/daemon.json:

{
...
"dns-search": ["internal.domain"]
}

Switching to internal.domain, which is lacking the wildcard dns record, makes the dns lookup inside the proxy container(s) fail/retry until our swarm-listener resolves by docker damon internal dns.

W0rks commented 7 years ago

In case anyone else is still facing this issue (as I was all week), be sure to check your firewall/router's DNS Lookup, if it has the capability. After a couple of days of pulling my hair out, it turns out my ISP had an incorrect A record for swarm-listener (automatically appending a top level domain suffix), that had it going out to publicly accessible and ping-able address owned by a UK telco. I had to change my DNS servers from the ones provided by the ISP to a public one, Google or OpenDNS, and everything is working as expected. Great work on the books!