Race-condition in Weave network?

bergtwvd commented 8 years ago

I have the following environment set up in AWS:

Three Docker Hosts, located at: docker-1A@Dublin, Public IP X.X.X.X docker-2A@Dublin, Public IP Y.Y.Y.Y docker-1B@Frankfurt, Public IP Z.Z.Z.Z

OS is Ubuntu 15.04, Docker 1.9, latest versions of Weave, Compose, Swarm.

The initial set of containers per Docker Host is (see attached files on the details how they are started):

docker-1A: weave scope weave router (init-peer-count 3) weave proxy consul (server, bootstrap-expect 2) registrator swarm slave smarm master
docker-2A: weave scope weave router weave proxy consul (client) registrator swarm slave
docker-1B: weave scope weave router weave proxy consul (server) registrator swarm slave

The three Weave scope containers are interconnected. The three weave router containers are interconnected. The three consul containers are interconnected.

The user application consists of a set of containers that together make up a distributed simulation. There is a central component called the CRC that registers new-comers. When a containter joins the simulation it first registers itself at the CRC. The CRC will pass back information about the other containers already joined, so that the newly joined container can connect to all the others. On the end the containers all interconnect in a meshed network that they build up, where each container has a connection to each other container.

The problem is that I intermittently see an error that a new container successfully connected to the CRC, but that it cannot connect to another already running container somewhere else in the Weave network.

The errorlog looks like:

[ShipSim3] Connection successful. hla.rti1516e.exceptions.RTIinternalError: Joining federate 'ShipSim31' cannot reach existing federate 'ShipSim<4>' at tcp:10.40.0.1;172.17.0.6/6000,udp:10.40.0.1/5000,host:10.40.0.1,address:10.40.0.1;172.17.0.6/6000 (909056008) ... followed by stacktrace

Looking at Scope, I see that federate 'ShipSim<4>' runs on docker-1A The CRC runs on docker-1B. And the new federate 'ShipSim31' runs on docker-2A.

The container called federate 'ShipSim<4>' is already running on IP 10.40.0.1. It seems that the network routing from the new federate 'ShipSim31' to the existing federate 'ShipSim<4>' has not been set up yet. I.e. the CRC told the new federate about an existing federate, but the network route seems not established.

Since I run on a WAN, it may be more likely that race conditions show up (if any). Please advice if you need more infomation, or even access to the test environment in AWS.

Here a dump of more information:

ubuntu@docker-1A:~$ weave status

   Version: 1.3.0

   Service: router
  Protocol: weave 1..2
      Name: b6:e5:b8:81:59:27(ip-172-31-9-126)
Encryption: disabled

PeerDiscovery: enabled Targets: 0 Connections: 2 (2 established) Peers: 3 (with 6 established connections)

   Service: ipam
 Consensus: achieved
     Range: 10.32.0.0-10.47.255.255

DefaultSubnet: 10.32.0.0/12

   Service: dns
    Domain: weave.local.
       TTL: 1
   Entries: 7

   Service: proxy
   Address: tcp://:12375

ubuntu@docker-1A:~$ weave status connections <- Y.Y.Y.Y:39418 established fastdp 42:9d:fe:58:9a:2a(ip-172-31-9-127) <- Z.Z.Z.Z:44967 established fastdp 26:40:3a:3b:c7:16(ip-172-31-27-45)

ubuntu@docker-2A:~$ weave status dns crc 10.44.0.0 d661cf60ba74 26:40:3a:3b:c7:16 ge 10.40.0.2 555ddbb82c45 b6:e5:b8:81:59:27 ubuntu_shipsim2_1 10.32.0.2 a9c8b4d37de8 42:9d:fe:58:9a:2a ubuntu_shipsim3_1 10.40.0.0 942c2826364c b6:e5:b8:81:59:27 ubuntu_shipsim_1 10.40.0.1 9273d45827d7 b6:e5:b8:81:59:27 ubuntu_trafficsim_1 10.43.255.255 05606f5f6d89 26:40:3a:3b:c7:16 web 10.32.0.1 794e4328dd40 42:9d:fe:58:9a:2a

docker-1A.txt docker-1B.txt docker-2A.txt

rade commented 8 years ago

Do you get the same problem when fast datapath is disabled (set WEAVE_NO_FASTDP=1 when launching weave)?

bergtwvd commented 8 years ago

I have seen this problem before, in earlier versions of Weave (without the latest datapath addition). In order to get grip on this issue, what other information should I provide, or what data can I collect and how?

rade commented 8 years ago

When the problem occurs, can you access the problematic containers and manually check (e.g. with ping) connectivity?

bergtwvd commented 8 years ago

I ran the test until the issue popped up again:

[Shipsim] Connection successful. hla.rti1516e.exceptions.RTIinternalError: Joining federate 'Shipsim1' cannot rea ch existing federate 'Shipsim<6>' at tcp:10.40.0.1;172.17.0.6/6000,udp:10.40.0.1 /5000,host:10.40.0.1,address:10.40.0.1;172.17.0.6/6000 (909056008)

From another shell I looked up the IP addresses of the other containers in the simulation: ubuntu@docker-1B:~$ weave dns-lookup crc 10.32.0.1 ubuntu@docker-1B:~$ weave dns-lookup web 10.44.0.0 ubuntu@docker-1B:~$ weave dns-lookup ge 10.40.0.0 ubuntu@docker-1B:~$ weave dns-lookup ship1 10.40.0.1 ubuntu@docker-1B:~$ weave dns-lookup ubuntu_trafficsim_1 10.43.255.255 ubuntu@docker-1B:~$ weave dns-lookup ubuntu_shipsim_1 10.32.0.2 ubuntu@docker-1B:~$ weave dns-lookup ship2 10.44.0.1

And for each one I did: ubuntu@docker-1B:~$ docker exec b0cf69b63ca7 ping < address >

And all pings succeeded! I suspect that the network got settled by the time I did the ping.

I wonder wether the Weave network needs some time to settle the addition of new containers and IP addresses? I do not like to build in an arbitary sleep in each containter start script. Please advice what I should do next.

awh commented 8 years ago

Thanks @bergtwvd. I have some questions about these logs:

[ShipSim3] Connection successful.
hla.rti1516e.exceptions.RTIinternalError: Joining federate 'ShipSim31' cannot reach existing federate 'ShipSim<4>' at tcp:10.40.0.1;172.17.0.6/6000,udp:10.40.0.1/5000,host:10.40.0.1,address:10.40.0.1;172.17.0.6/6000 (909056008)
... followed by stacktrace

Can I see the timestamps? In particular I am interested to know the delay between ShipSim3 successfully connecting to the CRC (and hence being told the addresses of the other federates) and the exception being reported by ShipSim3 that it cannot connect to them
Can I see the stacktrace? It may help me understand the nature of the connection failure. I see that this software uses a mixture of UDP and TCP - is it a TCP connection that is failing? A timeout on a UDP response?

It's also interesting that federates appear to be registering both their weave (e.g. 10.40.0.1) and docker (e.g. 172.17.0.6) addresses with the CRC - presuming that you're not running the Docker overlay network, those 172.17.0.0/16 addresses will not be reachable cross host - in fact unless you've manually specified different subnets for Docker in each host then ShipSim3 might be trying to talk to 172.17.0.6 on its own host, where it may either be unallocated or allocated to some random container (including itself potentially!). Is the software open source by any chance? I would be interested to see how it manages connecting to federates when it is given more than one IP address to try - it's possible that it is attempting to use the docker IP sometimes and failing.

bergtwvd commented 8 years ago

W.r.t. timestamp and stacktrace, I forwarded the request to the developer (located elsewhere in the world). This may take some time to get.

I will check if I can restrict the network interface that each federate uses to the Weave interface. Each federate has embedded a "Local RTI Component", that can beconfigured to some extent.

The software is currently not open source, although the RTI interface specification is an IEEE standard (1516-2010).

bergtwvd commented 8 years ago

Logging has been added to the federates in question. I tried to re-produce the problem with logging enabled. No luck so far (quite inherent to race-conditions). Also I noticed that the WAN is much more responsive at the moment. So I probably need to wait until there is more traffic and things slow down. (test involves two VMs in IRL and one in GER).

W.r.t. the network adapter, the federate has an eth0 interface and an ethwe interface. I am not sure whether forcing the federate to use a specifc interface matters. The sequence of steps it roughly:

federate gets the name of the CRC via the commandline (name corresponds to the container name);
federate connects to the CRC (i.e. DNS will resolve it to some Weave IP address);
federate attempts to create the federation execution;
federate joins the federation execution (the Local RTI component embedded in the federate will establish connections to each other federate, using information from the CRC (basically the (Weave) IP address of each other federate);
simulation starts ...

In previous runs the step with the connect succeeds, but the step with the join fails.

bergtwvd commented 8 years ago

I managed to reproduce it . Here the logfile from the federate in question:

ubuntu@docker-2A:~$ docker exec 8757cd9aed6c more /root/prti1516e/trace/LRC-151119_18.35.29-0.log
::::::::::::::
/root/prti1516e/trace/LRC-151119_18.35.29-0.log
::::::::::::::
T: 18:35:29.276 fed0 << connect(HLA_IMMEDIATE, "crcAddress=crc:8989")
T: 18:35:29.484 fed0 << connect
T: 18:35:29.485 fed0 << createFederationExecution("TheWorld", file:/tmp/shipsim/RPRFOM.xml)
T: 18:35:29.489 fed0 << createFederationExecution => hla.rti1516e.exceptions.FederationExecutionAlreadyExists: Federation TheWorld already exists (909000000)
T: 18:35:29.489 fed0 << joinFederationExecution("ShipSim", "TheWorld")
T: 18:35:29.606 fed6 << joinFederationExecution => hla.rti1516e.exceptions.RTIinternalError: Joining federate 'ShipSim2' cannot reach existing federate 'ShipSim1<5>' at tcp:10.44.0.1;172.17.0.6/6000,udp:10.44.0.1/5000,host:10.44.0.1,address:10.44.0.1;172.17.0.6/6000 (909056008)
T: 18:35:29.643 fed6 << getObjectClassHandle("BaseEntity.PhysicalEntity.Platform.SurfaceVessel")
T: 18:35:29.643 fed6 << getObjectClassHandle => hla.rti1516e.exceptions.FederateNotExecutionMember: getObjectClassHandle (909001001)
T: 18:35:29.647 fed6 << registerObjectInstance(null)
T: 18:35:29.647 fed6 << registerObjectInstance => hla.rti1516e.exceptions.FederateNotExecutionMember: registerObjectInstance (909001001)

The log entries after the "cannot reach existing federate" entry are a bug in the code, as the application should not progress when it did not join the federation execution. In the joinFederationExecution the connection to all federates in the federation execution is setup.

The container entrypoint is set to start this java application (java -cp .... code.jar). The elapsed time from entrypoint to the connect is I suspect very small.

After I got the log file, I did a ping from the container to ShipSim1<5> with:

ubuntu@docker-2A:~$ docker exec 8757cd9aed6c ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1) 56(84) bytes of data.
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=19.4 ms
64 bytes from 10.44.0.1: icmp_seq=2 ttl=64 time=19.3 ms
64 bytes from 10.44.0.1: icmp_seq=3 ttl=64 time=19.4 ms
...

Network interface information:

ubuntu@docker-2A:~$ docker exec 8757cd9aed6c ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:ac:11:00:06
          inet addr:172.17.0.6  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe11:6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:913 (913.0 B)  TX bytes:882 (882.0 B)

ethwe     Link encap:Ethernet  HWaddr fe:fb:5d:8e:17:c0
          inet addr:10.32.0.3  Bcast:0.0.0.0  Mask:255.240.0.0
          inet6 addr: fe80::fcfb:5dff:fe8e:17c0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
          RX packets:5751 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5753 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:566081 (566.0 KB)  TX bytes:584954 (584.9 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:554 errors:0 dropped:0 overruns:0 frame:0
          TX packets:554 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:27700 (27.7 KB)  TX bytes:27700 (27.7 KB)

And routing information:

ubuntu@docker-2A:~$ docker exec 8757cd9aed6c route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         ip-172-17-0-1.e 0.0.0.0         UG    0      0        0 eth0
10.32.0.0       *               255.240.0.0     U     0      0        0 ethwe
172.17.0.0      *               255.255.0.0     U     0      0        0 eth0
224.0.0.0       *               240.0.0.0       U     0      0        0 ethwe

awh commented 8 years ago

Thanks @bergtwvd. This is interesting - the delay between

T: 18:35:29.489 fed0 << joinFederationExecution("ShipSim", "TheWorld")

(which is presumably after the federate has received addresses from the CRC and before it is about to connect to them) and the failure

T: 18:35:29.606 fed6 << joinFederationExecution => hla.rti1516e.exceptions.RTIinternalError: Joining federate 'ShipSim2' cannot reach existing federate 'ShipSim1<5>' at tcp:10.44.0.1;172.17.0.6/6000,udp:10.44.0.1/5000,host:10.44.0.1,address:10.44.0.1;172.17.0.6/6000 (909056008)

is very short - about 100ms. This suggests to me that something is providing an immediate negative response - otherwise you would expect the delay to be longer due to retry attempts. Can you tell me what is involved in the connection process - is it TCP? Does it retry? Are you setting any socket options or a connection timeout? It would be really helpful to see the underlying cause Throwable of the RTIInternalError if possible.

bergtwvd commented 8 years ago

I have consulted with the supplier of the RTI to understand what is happening (network-wise) in the connect and joinFederationExecution methods.

In summary for a two federate federation with LRC1 and LRC2:

connect by any LRC:

LRC opens a listening socket on all its adapters or a specific adapter.
LRC connects to the CRC.
LRC provides its advertised addresses to the CRC.

joinFederationExecution by LRC1:

n/a

joinFederationExecution by LRC2:

The CRC provides LRC1's advertised addresses to LRC2.
LRC2 tries to connect to LRC1 at each of the addresses in LRC1's advertised address list.
If any one of the addresses successfully connects then everything is fine. If all addresses fail to produce a connection then an error message is produced and the join fails.

The key is that an LRC provides multiple advertised addresses by default, with a potential mix-up in IP connects as you mentioned in an earlier post. I did a test to force the LRC to use only one host interface and so far that seems to work.

I'll close this issue because I think the problem has been identified and solved.

Thanks for your help.

weaveworks / weave

Race-condition in Weave network? #1691