weaveworks / weave

Simple, resilient multi-host containers networking and more.
Apache License 2.0
6.62k stars 670 forks source link

fastdp rules set to two hosts for one mac #2428

Open bboreham opened 8 years ago

bboreham commented 8 years ago

As seen in weave report:

            "Flows": [
                    "FlowKeys": [
                        "TunnelFlowKey{id: 0000000000bec38e, ipv4src:, ipv4dst:}",
                        "InPortFlowKey{vport: 2}",
                        "EthernetFlowKey{src: 22:59:1b:91:a8:32, dst: ca:a8:55:5a:39:14}"
                    "Actions": [
                        "OutputAction{vport: 1}"
                    "Packets": 0,
                    "Bytes": 0,
                    "Used": 0
                    "FlowKeys": [
                        "InPortFlowKey{vport: 1}",
                        "EthernetFlowKey{src: ca:a8:55:5a:39:14, dst: 22:59:1b:91:a8:32}"
                    "Actions": [
                        "SetTunnelAction{id: 000000000038ebec, ipv4src:, ipv4dst:, ttl: 64, df: true}",
                        "OutputAction{vport: 2}",
                        "SetTunnelAction{id: 0000000000709bec, ipv4src:, ipv4dst:, ttl: 64, df: true}",
                        "OutputAction{vport: 2}"
                    "Packets": 0,
                    "Bytes": 0,
                    "Used": 0

Currently I have no idea why it decided to do this; in the logs the 22:59:1b:91:a8:32 address is only seen as remote from host2.

bboreham commented 8 years ago

It does this when we don't know the destination for a MAC; it relays the packet to every peer using the broadcast topology.

So now I don't know why it didn't remove the extra rules once the location of the MAC was learned.

bboreham commented 8 years ago

I failed to re-create the symptom after several runs. Here are the detailed logs for a run where it doesn't end up with two rules:

DEBU: 2016/07/06 17:09:15.198073 fastdp: unknown dst{1 false} {ca:a8:55:5a:39:14 92:0e:a0:96:3c:5c}
INFO: 2016/07/06 17:09:15.198355 Captured ca:a8:55:5a:39:14 -> 92:0e:a0:96:3c:5c
INFO: 2016/07/06 17:09:15.198483 Discovered local MAC ca:a8:55:5a:39:14
INFO: 2016/07/06 17:09:15.198500 Broadcasting ca:a8:55:5a:39:14 -> 92:0e:a0:96:3c:5c
DEBU: 2016/07/06 17:09:15.198761 Creating forwarding flow SetTunnelAction{id: 000000000038e24d, ipv4src:, ipv4dst:, tos: 0, ttl: 64, df: true, csum: false} on port 2
DEBU: 2016/07/06 17:09:15.198801 Creating forwarding flow SetTunnelAction{id: 000000000057324d, ipv4src:, ipv4dst:, tos: 0, ttl: 64, df: true, csum: false} on port 2
DEBU: 2016/07/06 17:09:15.198848 ODP miss with action: &{broadcast:false ops:[0xc82265f5e0 {DiscardingFlowOp:{} key:{BlobFlowKey:{typ:3 keyMask:[1 0 0 0 255 255 255 255]}}}]}
DEBU: 2016/07/06 17:09:15.200229 ODP miss: map[7:BlobFlowKey{type: 7, key: 0a2000020a24000201004000, mask: ffffffffffffffffffffffff} 20:BlobFlowKey{type: 20, key: 00000000, mask: ffffffff} 16:TunnelFlowKey{id: 000000000024d38e, ipv4src:, ipv4dst:, ttl: 64, tpsrc: 46409, tpdst: 6784} 11:BlobFlowKey{type: 11, key: 0000, mask: ffff} 4:EthernetFlowKey{src: 92:0e:a0:96:3c:5c, dst: ca:a8:55:5a:39:14} 19:BlobFlowKey{type: 19, key: 00000000, mask: ffffffff} 2:BlobFlowKey{type: 2, key: 00000000, mask: ffffffff} 15:BlobFlowKey{type: 15, key: 00000000, mask: ffffffff} 6:BlobFlowKey{type: 6, key: 0800, mask: ffff} 3:InPortFlowKey{vport: 2}] on port 2
INFO: 2016/07/06 17:09:15.200341 Discovered remote MAC 92:0e:a0:96:3c:5c at 5a:e6:43:04:ae:40(host2)
INFO: 2016/07/06 17:09:15.200385 Injecting 5a:e6:43:04:ae:40(host2) 92:0e:a0:96:3c:5c -> 46:98:1c:22:6d:b7(host1) ca:a8:55:5a:39:14
DEBU: 2016/07/06 17:09:15.200435 ODP miss with action: &{broadcast:false ops:[0xc8228fc1c0 {DiscardingFlowOp:{} key:{BlobFlowKey:{typ:3 keyMask:[2 0 0 0 255 255 255 255]}}}]}
DEBU: 2016/07/06 17:09:15.200532 Creating ODP flow FlowSpec{keys: [TunnelFlowKey{id: 000000000024d38e, ipv4src:, ipv4dst:} EthernetFlowKey{src: 92:0e:a0:96:3c:5c, dst: ca:a8:55:5a:39:14} InPortFlowKey{vport: 2}], actions: [OutputAction{vport: 1}]}
awh commented 8 years ago

It does this when we don't know the destination for a MAC; it relays the packet to every peer using the broadcast topology. So now I don't know why it didn't remove the extra rules once the location of the MAC was learned.

There are two kinds of broadcast:

  1. Broadcast bit of destination MAC is set
  2. Peer 'owning' the destination MAC is not known

In both cases we relay packets via the broadcast topology as you describe, but we only create flow rules in the first case.

Edit: this is for local bridging only (see here)

awh commented 8 years ago

First, some more background information from Bryan on what he was doing at the time this was observed - he had three connected peers, and was simply playing around with pinging between containers on two of them and observing for anything strange on the third. In other words, there was no attempt at forcing MAC collisions or moves as per the repro of #2436 or any other complicate shenanigans.

I have since reproduced the 'broadcast' flow rule seen above by following these steps:

  1. Begin with three connected peers host1, host2 & host 3 (I used the vagrantfile in weave/test)
  2. Start a container on host1 e.g. docker $(weave config) run -ti --name alpha --rm ubuntu:14.04
  3. Start a container on host2 e.g. docker $(weave config) run -ti --name omega --rm ubuntu:14.04
  4. Ping omega from alpha
  5. Wait for the router on host1 to expire the container MACs and flow rules (this takes ten and five minutes respectively by default; I reduced both these to one minute for testing)
  6. Ping omega from alpha again

The ICMP request packet resulting from the second ping results in the installation on host1 of a broadcast (e.g. to both host2 and host 3) flow rule destined for omega's MAC because the router has forgotten omegas MAC, and so we broadcast the packet here.

Why does this not happen on the first ping request? On the first ping, alpha's netns does not have an ARP cache entry for omega, so the ICMP request packet is preceded by an ARP request/response exchange - the ARP response, unicast from omega to alpha, enables the router to learn the correct destination peer for omega before it has to forward the ICMP request from alpha. On the second ping however, there is an entry for omega in alpha's netns ARP cache, which the kernel uses even though it is stale (it later follows up with an ARP exchange to refresh the cache, but the ICMP request goes first this time).

The only way that this (unnecessary) broadcast rule will go away is if it gets expired through lack of use - if traffic continues to flow to that MAC the flow will be kept alive.

awh commented 8 years ago

Note that this is less important than the suspected flow rule problem with #2436, as that can blackhole traffic indefinitely whereas this only impacts performance.