Closed dmelo closed 7 years ago
Can you post the docker-compose-stack.yml
file?
I'm on commit fdfb641 . Here is the file:
version: "3"
services:
proxy:
image: vfarcic/docker-flow-proxy
ports:
- 80:80
- 443:443
networks:
- proxy
environment:
- LISTENER_ADDRESS=swarm-listener
- MODE=swarm
deploy:
replicas: 2
swarm-listener:
image: vfarcic/docker-flow-swarm-listener
networks:
- proxy
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- DF_NOTIFY_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure
- DF_NOTIFY_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove
deploy:
placement:
constraints: [node.role == manager]
networks:
proxy:
external: true
I just deployed the same stack and it worked. Can you confirm that two replicas of the proxy and one of the listener are indeed running (e.g. docker stack ps ttt
)?
The error indicates that the listener cannot be reached from the proxy. Most common culprit is networking. If it's not working correctly, other services will fail to communicate as well. Can you enter into one of the proxy containers (e.g. docker exec -it [ID] sh
) and drill the listener. The commands would be:
apk add --update drill
drill swarm-listener
Assuming that the listener is running, if drill
does not answer, the problem is most likely in networking. In case only one of the proxy replicas is having the problem, please make sure that those commands are executed from that one.
Please post the output of the drill
command.
Is it possible that proxy resolves the dns of swarm-listener before the service is up (causing it to resolve to another IP) and then sticks with that IP even after the swarm-listener service is ready?
My DNS is resolving to IP 66.96.147.68 when it doesn't find the right IP.
When I was deploy the stack again, I saw that the service proxy was up before the swarm-listener. About the test you suggested, When I run drill swarm-listener
it is able to correctly resolve the name. Even further, I'm able to curl the URL.
[root@swarm01 ~]# docker service ls
ID NAME MODE REPLICAS IMAGE
uxmsvk9hgq2u ttt_swarm-listener replicated 0/1 vfarcic/docker-flow-swarm-listener:latest
v62e3zp1kqub ttt_proxy replicated 2/2 vfarcic/docker-flow-proxy:latest
[root@swarm01 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b28d85e26112 vfarcic/docker-flow-proxy:latest "/docker-entrypoin..." 10 seconds ago Up 9 seconds 80/tcp, 443/tcp, 8080/tcp ttt_proxy.1.zgtgdn2n8ktjlfsbnh08ieky7
[root@swarm01 ~]# docker logs -f b28d85e26112
2017/04/20 20:20:28 Starting HAProxy
2017/04/20 20:20:29 Starting "Docker Flow: Proxy"
2017/04/20 20:20:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:49 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:54 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:59 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
^C
[root@swarm01 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
qhkpqwhaggbotpp1mlpojiyen merov2 Ready Active Leader
t9ei33btf4tzx2v3e82rab3ff * swarm01.merov2 Ready Active Reachable
[root@swarm01 ~]# docker service ls
ID NAME MODE REPLICAS IMAGE
uxmsvk9hgq2u ttt_swarm-listener replicated 1/1 vfarcic/docker-flow-swarm-listener:latest
v62e3zp1kqub ttt_proxy replicated 2/2 vfarcic/docker-flow-proxy:latest
[root@swarm01 ~]# docker logs -f b28d85e26112
2017/04/20 20:20:28 Starting HAProxy
2017/04/20 20:20:29 Starting "Docker Flow: Proxy"
2017/04/20 20:20:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:49 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:54 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:20:59 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:04 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:09 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:14 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:19 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:24 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:29 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:49 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:54 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:21:59 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:04 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:09 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:14 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:19 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:24 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:29 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:34 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:39 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
2017/04/20 20:22:44 Error: Fetching config from swarm listener failed: Get http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services: dial tcp 66.96.147.68:8080: getsockopt: connection refused. Will retry in 5 seconds.
^C
[root@swarm01 ~]# docker exec -ti b28d85e26112 sh
/ # apk add --update drill
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/2) Installing ldns (1.6.17-r4)
(2/2) Installing drill (1.6.17-r4)
Executing busybox-1.25.1-r0.trigger
OK: 8 MiB in 17 packages
/ # drill swarm-listener
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 53613
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; swarm-listener. IN A
;; ANSWER SECTION:
swarm-listener. 600 IN A 10.0.0.5
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 0 msec
;; SERVER: 127.0.0.11
;; WHEN: Thu Apr 20 20:24:49 2017
;; MSG SIZE rcvd: 62
/ # apk add --update curl
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/4) Installing ca-certificates (20161130-r1)
(2/4) Installing libssh2 (1.7.0-r2)
(3/4) Installing libcurl (7.52.1-r2)
(4/4) Installing curl (7.52.1-r2)
Executing busybox-1.25.1-r0.trigger
Executing ca-certificates-20161130-r1.trigger
OK: 9 MiB in 21 packages
/ # curl http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services
[]/ #
The order should not influence it (at least not from DFP/DFSL point of view). If DFSL is not up, DFP will repeat requests.
I'm confused why DFP resolves swarm-listener
to 66.96.147.68
while drill
clearly indicates that. There must be a networking issue but I'm not sure what the cause is.
Can you try changing the LISTENER_ADDRESS
env. var. in the stack to the full name of the service (e.g. ttt_swarm-listener
)? You can also remove proxy containers so that they are rescheduled with Swarm. That would discard the option that the order matters. I'm convinced that it doesn't but, since I'm out of good ideas what the problem is...
Other than that, I think I would have to take a closer look into your system. Would it be possible to have a screen-sharing session? Would Monday suit you?
Any resolve on this, I am having the same issue. The proxy and swarm-listener DNS resolve both to some bogus IP addresses, errors in both services trying to communicate to each other.
Similar commands on my system..
$ docker service ls
time="2017-04-29T17:20:08-05:00" level=info msg="Unable to use system certificate pool: crypto/x509: system root pool is not available on Windows"
ID NAME MODE REPLICAS IMAGE
fa67ooofab8r proxy_proxy replicated 2/2 vfarcic/docker-flow-proxy:latest
gueu6l3kcz40 proxy_swarm-listener replicated 1/1 vfarcic/docker-flow-swarm-listener:latest
jnbcomqxbdmj registry replicated 1/1 registry:2
ptkf0bei2ugp app_nodered replicated 3/3 localhost:5000/noderedipaddress
ych165ntnxdk app_githubpages replicated 3/3 starefossen/github-pages:latest
$ docker-machine ssh ronswarm2
## .
## ## ## ==
## ## ## ## ## ===
/"""""""""""""""""\___/ ===
~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ / ===- ~~~
\______ o __/
\ \ __/
\____\_______/
_ _ ____ _ _
| |__ ___ ___ | |_|___ \ __| | ___ ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__| < __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 17.04.0-ce, build HEAD : c69677f - Thu Apr 6 16:26:16 UTC 2017
Docker version 17.04.0-ce, build 4845c56
docker@ronswarm2:~$ ping proxy
PING proxy (198.105.254.114): 56 data bytes
^C^C
--- proxy ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
docker@ronswarm2:~$ ping proxy_proxy
PING proxy_proxy (198.105.254.114): 56 data bytes
^C
--- proxy_proxy ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss
docker@ronswarm2:~$ ping proxy_swarm-listener
PING proxy_swarm-listener (198.105.254.114): 56 data bytes
^C
--- proxy_swarm-listener ping statistics ---
8 packets transmitted, 0 packets received, 100% packet loss
docker@ronswarm2:~$ ping swarm-listener
PING swarm-listener (198.105.254.114): 56 data bytes
^C
--- swarm-listener ping statistics ---
11 packets transmitted, 0 packets received, 100% packet loss
docker@ronswarm2:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff993d038225 vfarcic/docker-flow-proxy:latest "/docker-entrypoin..." 12 minutes ago Up 12 minutes 80/tcp, 443/tcp, 8080/tcp proxy_proxy.2.kbe7q1ws4jafbu9w3pu8ja1jy
319fef62211c starefossen/github-pages:latest "/bin/sh -c 'jekyl..." 32 minutes ago Up 31 minutes 4000/tcp app_githubpages.2.3oxr1wi1kxecnt01zht1s2uvx
docker@ronswarm2:~$ docker exec -it ff99 sh
/ # apk add --update drill
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/2) Installing ldns (1.6.17-r4)
(2/2) Installing drill (1.6.17-r4)
Executing busybox-1.25.1-r0.trigger
OK: 8 MiB in 17 packages
/ # drill swarm-listener
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 64438
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; swarm-listener. IN A
;; ANSWER SECTION:
swarm-listener. 10 IN A 198.105.254.114
swarm-listener. 10 IN A 198.105.244.114
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 16 msec
;; SERVER: 127.0.0.11
;; WHEN: Sat Apr 29 22:21:56 2017
;; MSG SIZE rcvd: 64
/ # apk add --update curl
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
(1/4) Installing ca-certificates (20161130-r1)
(2/4) Installing libssh2 (1.7.0-r2)
(3/4) Installing libcurl (7.52.1-r3)
(4/4) Installing curl (7.52.1-r3)
Executing busybox-1.25.1-r0.trigger
Executing ca-certificates-20161130-r1.trigger
OK: 9 MiB in 21 packages
/ # curl http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services
**(NO RESPONSE HERE)**
My docker-compose-stack.yml is the same as above.
@rslangham It is normal that you cannot reach a service by its name outside of an Overlay network that service is attached to.
Something is hijacking your Overlay network.
drill [SERVICE_NAME]
, you should always receive only one IP (address of a service, not its replicas). If, on the other hand, you execute drill tasks.[SERVICE_NAME]
, you should see the IPs of all the tasks (replicas).198.105.254.114
do not look like Overlay network internal addresses (unless you customized default IP ranges for the Overlay networking).The problems you're facing do not seem to be related to DFP or DFSL but networking.
Are you using the commands from the book to create your cluster? If you do, is your AWS account "virgin" (unmodified with corporate services)?
Could you post commands or steps that I could use to reproduce the same result? If you can't, can we have a screen-sharing session so that I take a closer look at your system?
Thanks for the quick reply. Yes, using commands from book or very similar. Though, I have stopped and started the docker machines some. I am using Windows hyper-V. They eat up all my memory if I leave them running. Great book, btw. I have been playing around with the swarm and reverse proxy for several weeks now, having other issues with proxy sometimes working and not working.
Open to screen sharing, available now.
It would be complicated for me to connect now. It's 1h in the morning where I leave. Need to get some rest. Can you join http://slack.devops20toolkit.com/ and ping me tomorrow?
absolutely, big time difference :) Joined, will ping tomorrow. thanks
I've had the same issue and connected into the box. dig and drill gave back the correct IP.
After checking the /etc/resolv.conf in the container i saw there was a:
search domain.net
configured.
After removing that it seems to work. Next thing up for me is to find out a permanent solution.
Just to chime in on this; I had a very similar problem where I could dig from one container to the other, but the couldn't seem to talk to each other anyway. In my case it was because there was actually two proxy overlay networks created in the swarm. They both had the same ID but had different running containers (and therefore nodes).
Our error seems more like something that's related to dockerd, but I thought I might give people having this issue another thing to check.
Networking is the most common problem with DFP (and other services) in Swarm. There are too many combinations and causes that might make network not work as expected.
We seemed to have fixed the issue on our end by delaying between network creation and service startup of the listener and proxy. We're provisioning with ansible and adding a 10 second pause between "docker network create proxy" and starting the flow-proxy containers.
Some time is indeed needed for the network DNSes to propagate. 10 seconds is a sure bet even though 1 second should be more than enough.
We've faced the same issue as described by @dmelo. It surfaced both for docker stack and docker service deployments. In our case the issue turned out to be:
Docker daemon picks up the dns-search from the node's network config and propagates this to /etc/resolver.conf of each and every container. I.e. when looking inside a docker-flow-proxy-container we see public.domain:
root:# docker exec -ti flow_proxy.1.$(docker service ps -f 'name=flow_proxy.1' -f 'desired-state=RUNNING' flow_proxy -q --no-trunc) /bin/sh
/ # cat /etc/resolv.conf
search public.domain
nameserver 127.0.0.11
options ndots:0
The root cause for the issue in our case was that we have a public wild card dns record *.public.domain resolving to our public facing proxy. During deployment we face the situation where proxy is reaching out to swarm-listener (http://swarm-listener:8080/v1/docker-flow-swarm-listener/get-services) before swarm-listener is up and running. The dns query on swarm-listener will not resolve and fallback (as it should be) is the dns-search domains. In our case swarm-listener.public.domain and due to the wildcard it resolves to our public IP. I have not bothered investigating why the Golang process then keeps this IP forever.
Solution As already discovered by @mbranchnl, the solution is related to the search instruction in /etc/resolv.conf. We changed the docker daemon's /etc/docker/daemon.json:
{
...
"dns-search": ["internal.domain"]
}
Switching to internal.domain, which is lacking the wildcard dns record, makes the dns lookup inside the proxy container(s) fail/retry until our swarm-listener resolves by docker damon internal dns.
In case anyone else is still facing this issue (as I was all week), be sure to check your firewall/router's DNS Lookup, if it has the capability. After a couple of days of pulling my hair out, it turns out my ISP had an incorrect A record for swarm-listener (automatically appending a top level domain suffix), that had it going out to publicly accessible and ping-able address owned by a UK telco. I had to change my DNS servers from the ones provided by the ISP to a public one, Google or OpenDNS, and everything is working as expected. Great work on the books!
I have a swarm with tree managers, tried to use docker-compose-stack.yml, but it seems that the proxy container is not resolving swarm-listener. I have tried on a docker swarm on AWS and another swarm on my laptop. Both have the same problem.
Here is my docker version:
Here are the steps to reproduce the problem: