weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 668 forks source link

No connectivity when peers are in non-interactive mode #3240

Open dhet opened 6 years ago

dhet commented 6 years ago

Intent: I want to use weave to connect distributed containers that communicate via "DDS". DDS is a PubSub middleware that uses multicast to distribute messages.

Problem: Lets assume we have a publisher container and a subscriber container. When the containers are started they find each other via multicast service discovery and then the publisher starts publishing sample messages.

This works fine when one of the containers is in interactive mode (docker run -it) and the application is started via shell. But when both containers are started in non-interactive mode they fail to even find each other.

This problem only occurs when the containers are distributed on different hosts. When both containers are running on the same host the example works fine.

How to reproduce: First, start weave on both hosts and establish a connection.

On host A run this to start the subscriber:

docker run --rm --name sub kugele1337/messenger subscriber -DCPSConfigFile rtps.ini

On host B run this to start the publisher:

docker run --rm --name pub kugele1337/messenger publisher -DCPSConfigFile rtps.ini

Expected result: Log output from message exchange between the containers Actual result: No log outpu


However, when starting one of the containers in interactive mode it works: On host A run:

docker run --rm --name sub kugele1337/messenger subscriber -DCPSConfigFile rtps.ini

On host B run:

docker run --rm -ti --name pub kugele1337/messenger
root@pub:/opt/workspace# ./publisher -DCPSConfigFile rtps.ini 

Result: Expected log output

brb commented 6 years ago

Have you verified that weave peers are actually connected to each other? weave status connections should tell you that.

dhet commented 6 years ago

@brb The peers are connected via sleeve mode.

Also, I tried it with different hosts and it worked flawlessly with them (they were Raspberry Pis). So it's a host specific problem to my workstation, which runs Ubuntu 16.

Even though the stuff works with other hosts, I would still like to know why it doesn't work for that specific host. Any tips on how to troubleshoot this? UFW is disabled.

Environment: docker -v: Docker version 17.12.0-ce, build c97c6d6

weave version :

weave script 2.2.0
weave 2.2.0

ip route:

default via 192.168.178.1 dev wlp4s0  proto static  metric 600 
169.254.0.0/16 dev wlp4s0  scope link  metric 1000 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-1a12e6e47da4  proto kernel  scope link  src 172.18.0.1 linkdown 
172.19.0.0/16 dev docker_gwbridge  proto kernel  scope link  src 172.19.0.1 linkdown 
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 linkdown 
192.168.178.0/24 dev wlp4s0  proto kernel  scope link  src 192.168.178.100  metric 600 

docker logs weave except:

INFO: 2018/02/07 12:24:36.569901 Discovered remote MAC 46:13:1c:12:94:08 at ce:84:cd:8d:94:e6(do1)
INFO: 2018/02/07 12:24:42.364266 GET /_ping
INFO: 2018/02/07 12:24:42.367890 POST /v1.35/containers/create?name=sub
INFO: 2018/02/07 12:24:42.368125 Creating container with WEAVE_CIDR ""
INFO: 2018/02/07 12:24:42.492782 POST /v1.35/containers/d0469a2b3515aeedf3101ea149f7845c56eb4213c1596dbc2b3b6b8fc2b34a77/attach?stderr=1&stdout=1&stream=1
INFO: 2018/02/07 12:24:42.494305 POST /v1.35/containers/d0469a2b3515aeedf3101ea149f7845c56eb4213c1596dbc2b3b6b8fc2b34a77/wait?condition=removed
INFO: 2018/02/07 12:24:42.495980 POST /v1.35/containers/d0469a2b3515aeedf3101ea149f7845c56eb4213c1596dbc2b3b6b8fc2b34a77/start
INFO: 2018/02/07 12:24:42.879825 Attaching container d0469a2b3515aeedf3101ea149f7845c56eb4213c1596dbc2b3b6b8fc2b34a77 with WEAVE_CIDR "" to weave network
INFO: 2018/02/07 12:24:43.780549 [nameserver 8a:f5:52:3a:32:25] adding entry for d0469a2b3515aeedf3101ea149f7845c56eb4213c1596dbc2b3b6b8fc2b34a77: sub.weave.local. -> 10.46.0.0
INFO: 2018/02/07 12:25:43.928583 [nameserver 8a:f5:52:3a:32:25] container d0469a2b3515aeedf3101ea149f7845c56eb4213c1596dbc2b3b6b8fc2b34a77 died; tombstoning entry sub.weave.local. -> 10.46.0.0
INFO: 2018/02/07 12:26:17.735997 GET /_ping
INFO: 2018/02/07 12:26:17.736981 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all
brb commented 6 years ago

The full weave log or all steps how you start weave would be helpful to see whether you might have passed the -no-multicast-route flag which disables multicast.

Also, please check that each container has the 244.0.0.0/4 route by docker exec $CONTAINER ip route.

Next thing, how do you verify the log output when starting containers in non-interactive mode? docker logs?

To test multicast, I used to use the bboreham/mballs image (https://github.com/bboreham/mballs). Does it work for you?

dhet commented 6 years ago

The full weave log or all steps how you start weave would be helpful to see whether you might have passed the -no-multicast-route flag which disables multicast.

Multicast is definitely enabled. I know because running the example in interactive mode works.

Also, please check that each container has the 244.0.0.0/4 route by docker exec $CONTAINER ip route.

I also checked ip route and the route was present in all containers.

Next thing, how do you verify the log output when starting containers in non-interactive mode? docker logs?

The logs are printed to stdout in any case and also, the containers shut down when all messages were transmitted. So you would definitely know when messages are transmitted.

To test multicast, I used to use the bboreham/mballs image (https://github.com/bboreham/mballs). Does it work for you?

Yes, the example works.


I think I need to restate the problem: The problem is not related to docker's -it flags. The thing that breaks the example is letting docker invoke the applications via entrypoint. When invoking either application "by hand", it works.

So here's exactly what I did (see below for detailed logs):

  1. I set up weave on both hosts ("local host" and "remote host")
  2. I run a failing example: I let docker run the commands via entrypoint -- this doesn't work
  3. I run a working example: I manually run the commands via shell -- this works

1. Weave Setup

On the remote host I ran

user@remote-host:~# weave launch
user@remote-host:~# eval $(weave env)

On the local host I ran

user@local-host:~$ weave launch do1
user@local-host:~$ eval $(weave env)

They're connected:

user@local-host:~$ weave status connections
-> REMOTE_HOST_IP:6783   established sleeve 7a:0c:76:06:37:d8(remote-host) mtu=1438

Logs on remote host:

user@remote-host:~# docker logs weave
INFO: 2018/02/10 18:39:25.181063 Command line options: map[datapath:datapath http-addr:127.0.0.1:6784 nickname:do1 weave-bridge:weave docker-bridge:docker0 ipalloc-range:10.32.0.0/12 port:6783 resolv-conf:/var/run/weave/etc/resolv.conf status-addr:127.0.0.1:6782 dns-listen-address:172.17.0.1:53 H:[unix:///var/run/weave/weave.sock] host-root:/host plugin:true proxy:true]
INFO: 2018/02/10 18:39:25.181295 weave  2.2.0
INFO: 2018/02/10 18:39:25.189527 Docker API on unix:///var/run/docker.sock: &[Platform={"Name":""} Components=[{"Details":{"ApiVersion":"1.35","Arch":"amd64","BuildTime":"2017-12-27T20:09:53.000000000+00:00","Experimental":"false","GitCommit":"c97c6d6","GoVersion":"go1.9.2","KernelVersion":"4.4.0-112-generic","MinAPIVersion":"1.12","Os":"linux"},"Name":"Engine","Version":"17.12.0-ce"}] MinAPIVersion=1.12 Version=17.12.0-ce ApiVersion=1.35 GitCommit=c97c6d6 GoVersion=go1.9.2 Os=linux Arch=amd64 KernelVersion=4.4.0-112-generic BuildTime=2017-12-27T20:09:53.000000000+00:00]
INFO: 2018/02/10 18:39:25.197678 proxy listening on unix:///var/run/weave/weave.sock
INFO: 2018/02/10 18:39:25.297226 Bridge type is bridged_fastdp
INFO: 2018/02/10 18:39:25.297252 Communication between peers is unencrypted.
INFO: 2018/02/10 18:39:25.302432 Our name is 7a:0c:76:06:37:d8(remote-host)
INFO: 2018/02/10 18:39:25.302476 Launch detected - using supplied peer list: []
INFO: 2018/02/10 18:39:25.312237 Docker API on unix:///var/run/docker.sock: &[Version=17.12.0-ce ApiVersion=1.35 GitCommit=c97c6d6 KernelVersion=4.4.0-112-generic Components=[{"Details":{"ApiVersion":"1.35","Arch":"amd64","BuildTime":"2017-12-27T20:09:53.000000000+00:00","Experimental":"false","GitCommit":"c97c6d6","GoVersion":"go1.9.2","KernelVersion":"4.4.0-112-generic","MinAPIVersion":"1.12","Os":"linux"},"Name":"Engine","Version":"17.12.0-ce"}] MinAPIVersion=1.12 GoVersion=go1.9.2 Os=linux Arch=amd64 BuildTime=2017-12-27T20:09:53.000000000+00:00 Platform={"Name":""}]
INFO: 2018/02/10 18:39:25.312976 Checking for pre-existing addresses on weave bridge
INFO: 2018/02/10 18:39:25.412581 [allocator 7a:0c:76:06:37:d8] No valid persisted data
INFO: 2018/02/10 18:39:25.424074 [allocator 7a:0c:76:06:37:d8] Initialising via deferred consensus
INFO: 2018/02/10 18:39:25.431123 Listening for DNS queries on 172.17.0.1:53
INFO: 2018/02/10 18:39:25.448757 Sniffing traffic on datapath (via ODP)
INFO: 2018/02/10 18:39:25.453246 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/02/10 18:39:25.484143 Listening for metrics requests on 127.0.0.1:6782
INFO: 2018/02/10 18:39:25.490521 Waiting for Weave API Server...
INFO: 2018/02/10 18:39:25.522639 Finished waiting for Weave API Server
INFO: 2018/02/10 18:39:25.527770 Listening on /run/docker/plugins/weave.sock for global scope
INFO: 2018/02/10 18:39:25.527840 Listening on /run/docker/plugins/weavemesh.sock for local scope
INFO: 2018/02/10 18:39:25.527858 Creating default "weave" network
INFO: 2018/02/10 18:39:25.544896 [ipam] RequestPool weavelocal 10.32.0.0/12  map[]
INFO: 2018/02/10 18:39:25.548248 [net] CreateNetwork aff74080101dd974d61411161381ff819628cdceca503aeb75bcb5593895b78b
INFO: 2018/02/10 18:39:51.727548 ->[LOCAL_HOST_IP:17957] connection accepted
INFO: 2018/02/10 18:39:51.755756 ->[LOCAL_HOST_IP:17957|8a:f5:52:3a:32:25(local-host)]: connection ready; using protocol version 2
INFO: 2018/02/10 18:39:51.755834 overlay_switch ->[8a:f5:52:3a:32:25(local-host)] using fastdp
INFO: 2018/02/10 18:39:51.755868 ->[LOCAL_HOST_IP:17957|8a:f5:52:3a:32:25(local-host)]: connection added (new peer)
INFO: 2018/02/10 18:39:51.816027 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/02/10 18:39:51.816325 overlay_switch ->[8a:f5:52:3a:32:25(local-host)] using sleeve
INFO: 2018/02/10 18:39:51.816442 ->[LOCAL_HOST_IP:17957|8a:f5:52:3a:32:25(local-host)]: connection fully established
INFO: 2018/02/10 18:39:51.855759 Discovered remote MAC 82:ee:17:0a:87:c5 at 8a:f5:52:3a:32:25(local-host)
INFO: 2018/02/10 18:39:51.856303 sleeve ->[LOCAL_HOST_IP:17939|8a:f5:52:3a:32:25(local-host)]: Effective MTU verified at 1438
INFO: 2018/02/10 18:39:52.171996 Discovered remote MAC a2:77:f4:10:e7:b0 at 8a:f5:52:3a:32:25(local-host)
INFO: 2018/02/10 18:39:52.416052 Discovered remote MAC 8a:f5:52:3a:32:25 at 8a:f5:52:3a:32:25(local-host)
INFO: 2018/02/10 18:39:55.052808 GET /_ping
INFO: 2018/02/10 18:39:55.054396 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all

Logs on local host:

user@local-host:~$ docker logs weave
INFO: 2018/02/10 18:39:51.396407 Command line options: map[H:[unix:///var/run/weave/weave.sock] http-addr:127.0.0.1:6784 resolv-conf:/var/run/weave/etc/resolv.conf status-addr:127.0.0.1:6782 datapath:datapath dns-listen-address:172.17.0.1:53 docker-bridge:docker0 proxy:true host-root:/host nickname:thinkpad plugin:true weave-bridge:weave ipalloc-range:10.32.0.0/12 port:6783]
INFO: 2018/02/10 18:39:51.396537 weave  2.2.0
INFO: 2018/02/10 18:39:51.398765 Docker API on unix:///var/run/docker.sock: &[GitCommit=c97c6d6 GoVersion=go1.9.2 Os=linux Arch=amd64 KernelVersion=4.4.0-112-generic Platform={"Name":""} Version=17.12.0-ce MinAPIVersion=1.12 BuildTime=2017-12-27T20:09:53.000000000+00:00 Components=[{"Details":{"ApiVersion":"1.35","Arch":"amd64","BuildTime":"2017-12-27T20:09:53.000000000+00:00","Experimental":"false","GitCommit":"c97c6d6","GoVersion":"go1.9.2","KernelVersion":"4.4.0-112-generic","MinAPIVersion":"1.12","Os":"linux"},"Name":"Engine","Version":"17.12.0-ce"}] ApiVersion=1.35]
INFO: 2018/02/10 18:39:51.413186 proxy listening on unix:///var/run/weave/weave.sock
INFO: 2018/02/10 18:39:51.536366 Bridge type is bridged_fastdp
INFO: 2018/02/10 18:39:51.536391 Communication between peers is unencrypted.
INFO: 2018/02/10 18:39:51.563703 Our name is 8a:f5:52:3a:32:25(local-host)
INFO: 2018/02/10 18:39:51.563745 Launch detected - using supplied peer list: [do1]
INFO: 2018/02/10 18:39:51.565193 Docker API on unix:///var/run/docker.sock: &[GitCommit=c97c6d6 Os=linux KernelVersion=4.4.0-112-generic Components=[{"Details":{"ApiVersion":"1.35","Arch":"amd64","BuildTime":"2017-12-27T20:09:53.000000000+00:00","Experimental":"false","GitCommit":"c97c6d6","GoVersion":"go1.9.2","KernelVersion":"4.4.0-112-generic","MinAPIVersion":"1.12","Os":"linux"},"Name":"Engine","Version":"17.12.0-ce"}] Version=17.12.0-ce MinAPIVersion=1.12 Arch=amd64 BuildTime=2017-12-27T20:09:53.000000000+00:00 Platform={"Name":""} ApiVersion=1.35 GoVersion=go1.9.2]
INFO: 2018/02/10 18:39:51.566033 Checking for pre-existing addresses on weave bridge
INFO: 2018/02/10 18:39:51.583643 [allocator 8a:f5:52:3a:32:25] No valid persisted data
INFO: 2018/02/10 18:39:51.601804 [allocator 8a:f5:52:3a:32:25] Initialising via deferred consensus
INFO: 2018/02/10 18:39:51.603387 Listening for DNS queries on 172.17.0.1:53
INFO: 2018/02/10 18:39:51.608504 Sniffing traffic on datapath (via ODP)
INFO: 2018/02/10 18:39:51.690295 ->[REMOTE_HOST_IP:6783] attempting connection
INFO: 2018/02/10 18:39:51.698203 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/02/10 18:39:51.698536 Listening for metrics requests on 127.0.0.1:6782
INFO: 2018/02/10 18:39:51.698574 Waiting for Weave API Server...
INFO: 2018/02/10 18:39:51.699371 Finished waiting for Weave API Server
INFO: 2018/02/10 18:39:51.699417 Listening on /run/docker/plugins/weave.sock for global scope
INFO: 2018/02/10 18:39:51.699443 Listening on /run/docker/plugins/weavemesh.sock for local scope
INFO: 2018/02/10 18:39:51.699454 Creating default "weave" network
INFO: 2018/02/10 18:39:51.700538 [ipam] RequestPool weavelocal 10.32.0.0/12  map[]
INFO: 2018/02/10 18:39:51.700839 [net] CreateNetwork ca0d144670bc7ec86fe63738bbfe80ffbbb1175f5587566684f8748c2500dfcd
INFO: 2018/02/10 18:39:51.774011 ->[REMOTE_HOST_IP:6783|7a:0c:76:06:37:d8(remote-host)]: connection ready; using protocol version 2
INFO: 2018/02/10 18:39:51.774298 overlay_switch ->[7a:0c:76:06:37:d8(remote-host)] using fastdp
INFO: 2018/02/10 18:39:51.774465 ->[REMOTE_HOST_IP:6783|7a:0c:76:06:37:d8(remote-host)]: connection added (new peer)
INFO: 2018/02/10 18:39:51.801934 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2018/02/10 18:39:51.802085 overlay_switch ->[7a:0c:76:06:37:d8(remote-host)] using sleeve
INFO: 2018/02/10 18:39:51.802159 ->[REMOTE_HOST_IP:6783|7a:0c:76:06:37:d8(remote-host)]: connection fully established
INFO: 2018/02/10 18:39:51.804485 overlay_switch ->[7a:0c:76:06:37:d8(remote-host)] using fastdp
INFO: 2018/02/10 18:39:51.838130 sleeve ->[REMOTE_HOST_IP:6783|7a:0c:76:06:37:d8(remote-host)]: Effective MTU verified at 1438
INFO: 2018/02/10 18:40:51.775837 overlay_switch ->[7a:0c:76:06:37:d8(remote-host)] fastdp timed out waiting for vxlan heartbeat
INFO: 2018/02/10 18:40:51.775948 overlay_switch ->[7a:0c:76:06:37:d8(remote-host)] using sleeve
INFO: 2018/02/10 18:40:58.728891 GET /_ping
INFO: 2018/02/10 18:40:58.747560 GET /_ping
INFO: 2018/02/10 18:40:58.750779 POST /v1.35/containers/create
INFO: 2018/02/10 18:40:58.751038 Leaving container alone because the container has '--net=host'
INFO: 2018/02/10 18:40:58.880142 POST /v1.35/containers/1ef7c2a72b5060973439ccad6db8be64840ebbc51225cbdcb7bb9fa5ee9404ec/attach?stderr=1&stdout=1&stream=1
INFO: 2018/02/10 18:40:58.881999 POST /v1.35/containers/1ef7c2a72b5060973439ccad6db8be64840ebbc51225cbdcb7bb9fa5ee9404ec/wait?condition=removed
INFO: 2018/02/10 18:40:58.883920 POST /v1.35/containers/1ef7c2a72b5060973439ccad6db8be64840ebbc51225cbdcb7bb9fa5ee9404ec/start
INFO: 2018/02/10 18:41:30.022052 GET /_ping
INFO: 2018/02/10 18:41:30.023094 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all

2. Running the applications via entrypoint -- this doesn't work

Local host:

docker run --rm -ti kugele1337/spammer-pub --samples 10

(internally, this runs ./publisher -DCPSConfigFile rtps.ini --samples 10)

Remote host:

docker run --rm -ti kugele1337/spammer-sub --debug

(internally, this runs ./subscriber -DCPSConfigFile rtps.ini --debug)

nothing happens

New logs on remote host:

INFO: 2018/02/10 18:39:55.059436 GET /v1.35/containers/weave/json
INFO: 2018/02/10 18:41:41.815790 overlay_switch ->[8a:f5:52:3a:32:25(local-host)] fastdp timed out waiting for vxlan heartbeat
INFO: 2018/02/10 18:44:28.363175 GET /_ping
INFO: 2018/02/10 18:44:28.365033 POST /v1.35/containers/create?name=do1
INFO: 2018/02/10 18:44:28.365199 Creating container with WEAVE_CIDR ""
INFO: 2018/02/10 18:44:28.391849 POST /v1.35/containers/53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84/attach?stderr=1&stdin=1&stdout=1&stream=1
INFO: 2018/02/10 18:44:28.393850 POST /v1.35/containers/53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84/wait?condition=removed
INFO: 2018/02/10 18:44:28.395758 POST /v1.35/containers/53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84/start
INFO: 2018/02/10 18:44:28.630119 Attaching container 53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84 with WEAVE_CIDR "" to weave network
INFO: 2018/02/10 18:44:28.634052 Assuming quorum size of 1
INFO: 2018/02/10 18:44:29.084588 [nameserver 7a:0c:76:06:37:d8] adding entry for 53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84: do1.weave.local. -> 10.32.0.1
INFO: 2018/02/10 18:44:29.089668 POST /v1.35/containers/53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84/resize?h=64&w=254
INFO: 2018/02/10 18:44:39.748713 Discovered remote MAC 5e:40:2a:af:74:a1 at 8a:f5:52:3a:32:25(local-host)
WARN: 2018/02/10 18:46:15.139268 read unix /var/run/weave/weave.sock->@: use of closed network connection
INFO: 2018/02/10 18:46:15.171675 [nameserver 7a:0c:76:06:37:d8] container 53b6f6cf84bc92650233116befd64af7063bbff13e030ba1b86bc35412a6ad84 died; tombstoning entry do1.weave.local. -> 10.32.0.1
INFO: 2018/02/10 18:49:19.691424 GET /_ping
INFO: 2018/02/10 18:49:19.692468 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all

New logs on local host:

INFO: 2018/02/10 18:41:30.024475 GET /v1.35/containers/weave/json
INFO: 2018/02/10 18:44:29.120529 Discovered remote MAC 56:75:09:26:1e:1d at 7a:0c:76:06:37:d8(remote-host)
INFO: 2018/02/10 18:44:38.311571 GET /_ping
INFO: 2018/02/10 18:44:38.315096 POST /v1.35/containers/create
INFO: 2018/02/10 18:44:38.315258 Creating container with WEAVE_CIDR ""
INFO: 2018/02/10 18:44:38.429210 POST /v1.35/containers/b443fa0cee9f78b8573ebf7577c37bec9fdefd433a3f30c1e36078b34c66bd82/attach?stderr=1&stdin=1&stdout=1&stream=1
INFO: 2018/02/10 18:44:38.431022 POST /v1.35/containers/b443fa0cee9f78b8573ebf7577c37bec9fdefd433a3f30c1e36078b34c66bd82/wait?condition=removed
INFO: 2018/02/10 18:44:38.432830 POST /v1.35/containers/b443fa0cee9f78b8573ebf7577c37bec9fdefd433a3f30c1e36078b34c66bd82/start
INFO: 2018/02/10 18:44:38.834651 Attaching container b443fa0cee9f78b8573ebf7577c37bec9fdefd433a3f30c1e36078b34c66bd82 with WEAVE_CIDR "" to weave network
INFO: 2018/02/10 18:44:39.761614 [nameserver 8a:f5:52:3a:32:25] Ignoring registration b443fa0cee9f. 10.44.0.0 b443fa0cee9f78b8573ebf7577c37bec9fdefd433a3f30c1e36078b34c66bd82 (not a subdomain of weave.local.)
INFO: 2018/02/10 18:44:39.763977 POST /v1.35/containers/b443fa0cee9f78b8573ebf7577c37bec9fdefd433a3f30c1e36078b34c66bd82/resize?h=64&w=254
WARN: 2018/02/10 18:45:39.872174 read unix /var/run/weave/weave.sock->@: use of closed network connection
INFO: 2018/02/10 18:45:45.407463 GET /_ping
INFO: 2018/02/10 18:45:45.410358 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all

3. Running the applications manually (this works)

Local host:

user@local-host:~$ docker run --rm -ti --entrypoint /bin/bash kugele1337/spammer-pub
root@23b522242c2c:/opt/workspace# ./publisher -DCPSConfigFile rtps.ini --samples 10
root@23b522242c2c:/opt/workspace# exit

Remote host:

user@remote-host:~# docker run --rm -ti --entrypoint /bin/bash kugele1337/spammer-sub
user@remote-host:/opt/workspace# ./subscriber -DCPSConfigFile rtps.ini --debug
subscription matched    current_count: 1, total_count: 1
liveliness changed  alive_count: 1, not_alive_count: 0
1518288840320167  1518288840313067  7100
1518288840423123  1518288840414169  8954
1518288840523056  1518288840515338  7718
1518288840627053  1518288840616791  10262
1518288840727029  1518288840718808  8221
1518288840827487  1518288840820389  7098
1518288840927976  1518288840921655  6321
1518288841030979  1518288841022889  8090
1518288841131048  1518288841123858  7190
1518288841231413  1518288841224949  6464
liveliness changed  alive_count: 0, not_alive_count: 0
subscription matched    current_count: 0, total_count: 1
user@remote-host:/opt/workspace# exit

The output displayed is the intended output

New logs on remote host:

INFO: 2018/02/10 18:49:19.696630 GET /v1.35/containers/weave/json
INFO: 2018/02/10 18:52:55.268265 GET /_ping
INFO: 2018/02/10 18:52:55.270156 POST /v1.35/containers/create?name=do1
INFO: 2018/02/10 18:52:55.270343 Creating container with WEAVE_CIDR ""
INFO: 2018/02/10 18:52:55.295744 POST /v1.35/containers/0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107/attach?stderr=1&stdin=1&stdout=1&stream=1
INFO: 2018/02/10 18:52:55.297264 POST /v1.35/containers/0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107/wait?condition=removed
INFO: 2018/02/10 18:52:55.298496 POST /v1.35/containers/0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107/start
INFO: 2018/02/10 18:52:55.544620 Attaching container 0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107 with WEAVE_CIDR "" to weave network
INFO: 2018/02/10 18:52:55.970492 [nameserver 7a:0c:76:06:37:d8] adding entry for 0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107: do1.weave.local. -> 10.32.0.1
INFO: 2018/02/10 18:52:55.975111 POST /v1.35/containers/0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107/resize?h=64&w=254
INFO: 2018/02/10 18:53:14.874770 Discovered remote MAC 56:ac:f4:68:23:96 at 8a:f5:52:3a:32:25(local-host)
INFO: 2018/02/10 18:53:19.366867 Discovered remote MAC e6:85:a4:c3:10:c5 at 8a:f5:52:3a:32:25(local-host)
WARN: 2018/02/10 18:54:08.267639 read unix /var/run/weave/weave.sock->@: use of closed network connection
INFO: 2018/02/10 18:54:08.301284 [nameserver 7a:0c:76:06:37:d8] container 0c1ae86dcd23caf190f3f7873cd728d48eea044517f14d10bb62b73bf0a28107 died; tombstoning entry do1.weave.local. -> 10.32.0.1
INFO: 2018/02/10 18:57:31.366211 GET /_ping
INFO: 2018/02/10 18:57:31.367302 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all

New logs on local host:

INFO: 2018/02/10 18:45:45.412297 GET /v1.35/containers/weave/json
INFO: 2018/02/10 18:52:56.104108 Discovered remote MAC f2:cd:66:8c:e8:fc at 7a:0c:76:06:37:d8(remote-host)
INFO: 2018/02/10 18:53:13.371617 GET /_ping
INFO: 2018/02/10 18:53:13.376954 POST /v1.35/containers/create
INFO: 2018/02/10 18:53:13.377146 Creating container with WEAVE_CIDR ""
INFO: 2018/02/10 18:53:13.514240 POST /v1.35/containers/5c8e53b6a4b36efb4b6bcc45b2340bde9ca00395bdd4c95091d45e0dc2b47997/attach?stderr=1&stdin=1&stdout=1&stream=1
INFO: 2018/02/10 18:53:13.516710 POST /v1.35/containers/5c8e53b6a4b36efb4b6bcc45b2340bde9ca00395bdd4c95091d45e0dc2b47997/wait?condition=removed
INFO: 2018/02/10 18:53:13.519566 POST /v1.35/containers/5c8e53b6a4b36efb4b6bcc45b2340bde9ca00395bdd4c95091d45e0dc2b47997/start
INFO: 2018/02/10 18:53:14.038250 Attaching container 5c8e53b6a4b36efb4b6bcc45b2340bde9ca00395bdd4c95091d45e0dc2b47997 with WEAVE_CIDR "" to weave network
INFO: 2018/02/10 18:53:14.882700 [nameserver 8a:f5:52:3a:32:25] Ignoring registration 5c8e53b6a4b3. 10.44.0.0 5c8e53b6a4b36efb4b6bcc45b2340bde9ca00395bdd4c95091d45e0dc2b47997 (not a subdomain of weave.local.)
INFO: 2018/02/10 18:53:14.883879 POST /v1.35/containers/5c8e53b6a4b36efb4b6bcc45b2340bde9ca00395bdd4c95091d45e0dc2b47997/resize?h=64&w=254
WARN: 2018/02/10 18:53:14.978281 read unix /var/run/weave/weave.sock->@: use of closed network connection
INFO: 2018/02/10 18:53:17.820883 GET /_ping
INFO: 2018/02/10 18:53:17.824472 POST /v1.35/containers/create
INFO: 2018/02/10 18:53:17.824702 Creating container with WEAVE_CIDR ""
INFO: 2018/02/10 18:53:17.949445 POST /v1.35/containers/23b522242c2c7f1936d51cc9ba75b4278f862d608ec6d43d1ec58e5a4de9abd6/attach?stderr=1&stdin=1&stdout=1&stream=1
INFO: 2018/02/10 18:53:17.951037 POST /v1.35/containers/23b522242c2c7f1936d51cc9ba75b4278f862d608ec6d43d1ec58e5a4de9abd6/wait?condition=removed
INFO: 2018/02/10 18:53:17.953088 POST /v1.35/containers/23b522242c2c7f1936d51cc9ba75b4278f862d608ec6d43d1ec58e5a4de9abd6/start
INFO: 2018/02/10 18:53:18.463501 Attaching container 23b522242c2c7f1936d51cc9ba75b4278f862d608ec6d43d1ec58e5a4de9abd6 with WEAVE_CIDR "" to weave network
INFO: 2018/02/10 18:53:19.375247 [nameserver 8a:f5:52:3a:32:25] Ignoring registration 23b522242c2c. 10.44.0.0 23b522242c2c7f1936d51cc9ba75b4278f862d608ec6d43d1ec58e5a4de9abd6 (not a subdomain of weave.local.)
INFO: 2018/02/10 18:53:19.375632 POST /v1.35/containers/23b522242c2c7f1936d51cc9ba75b4278f862d608ec6d43d1ec58e5a4de9abd6/resize?h=64&w=254
WARN: 2018/02/10 18:54:15.217284 read unix /var/run/weave/weave.sock->@: use of closed network connection
INFO: 2018/02/10 18:54:28.623595 GET /_ping
INFO: 2018/02/10 18:54:28.626778 GET /v1.35/containers/weave/logs?stderr=1&stdout=1&tail=all
brb commented 6 years ago

Thanks for the logs.

Is /opt/workspace in $PATH of your image? If not, then you need to change the entry point from ENTRYPOINT [ "subscriber", "-DCPSConfigFile", "rtps.ini" ] to ENTRYPOINT [ "./subscriber", "-DCPSConfigFile", "rtps.ini" ] in your Dockerfile (for the publisher as well).

If it does not help, could you check whether adding a short sleep before running the publisher helps?

dhet commented 6 years ago

Thanks, adding a sleep before running the programs did the trick 👍

Btw, I found out that the problem isn't limited to one host as I previously assumed, but occurs with every host I tried it with (all Debian based though).

I'm leaving this issue open because I think this is a problem that you guys have to address, either by fixing something in weave or by adding a disclaimer in the docs.

If this misbehavior is indeed related to weave, and not docker, my guess is that the source of it is weave taking too long to initialize. Maybe there's a way to block the entry point execution for as long as weave isn't properly set up?

In summary, the problem occurs if