Open abligh opened 9 years ago
If a broadcast packet egresses from (e.g.)
W
, it will be transmitted (directly or indirectly) topeerA
, which will emit the packet towards the VLAN so it can reachHost1
andHost2
.peerB
will also receive the packet, and transmit the packet on, inter alia topeerA
.
Correction: The packet will reach peerA
only once; weave's broadcast routing logic ensures that (except when the topology is in flux). However both peerA
and peerB
will inject the packet, which is what is causing the duplication you are seeing.
slave mode [...] the
pcap
interface would be 'switched off'
...for both capture and inject. That is already possible, by starting the router with a blank -iface
parameter. And it would be quite straightforward to start the capture/inject later. What's missing is a) a way to disable a running capture/inject (including clearing the MAC cache), and b) hooks to do all this dynamically via the http api.
Neither of which would be hard, though there are some challenges, e.g. packet capture is a blocking call (I suppose there is no harm in performing some check just after that, but all this is in the critical path performance wise, so we want to do minimal locking and channel interaction).
As to whether it's a good idea overall... it does strike me as rather a niche feature which, unlike the rest of weave, requires some additional coordinator / health-checker to be of any use.
Correction: The packet will reach peerA only once; weave's broadcast routing logic ensures that (except when the topology is in flux). However both peerA and peerB will inject the packet, which is what is causing the duplication you are seeing.
Weave's logic is not the issue. The packet reaches peerA
first from W
, then peerA
sends it to the VLAN. peerB
receives it from the VLAN, and treats it as a completely new packet ingressing weave's network, and sends it to inter-alia peerA
again, hence there is a loop (not merely duplication). Weave's logic is not to blame here as it doesn't expect one of its peers to be connected to another other than via weave. When the packet re-ingresses through peerB
(from the VLAN), peerB
has no way to distinguish it from any other packet ingressing from 'outside'; i.e. as far as weave is concerned, it's a second broadcast.
...for both capture and inject. That is already possible, by starting the router with a blank -iface parameter. And it would be quite straightforward to start the capture/inject later. What's missing is a) a way to disable a running capture/inject (including clearing the MAC cache), and b) hooks to do all this dynamically via the http api.
Neither of which would be hard, though there are some challenges, e.g. packet capture is a blocking call (I suppose there is no harm in performing some check just after that, but all this is in the critical path performance wise, so we want to do minimal locking and channel interaction).
As to whether it's a good idea overall... it does strike me as rather a niche feature which, unlike the rest of weave, requires some additional coordinator / health-checker to be of any use.
Thanks. I'll think about that.
It gives me three further ideas:
weaver
on the slave with a blank -iface
parameter, and restart weaver
when going from master to slave or vice versa. At least W
, X
, Y
and Z
won't believe the slave to be dead, hastening reconnection, although the topology would have to be rebuilt.weaver
running normally but attempt to use some ebtables
magic to cut off connectivity on the slave. Issues: MAC cache is not cleared; experimentation required to determine whether ebtables
can be persuaded to block bidirectionally.Off-the-top-of-my-head simple idea for an STP-like protocol (called ALONE
- Avoid Loops On Networked Ethernet - as I can't immediately think of anything better):
ALONE
protocol is an L2 packet (or UDP multicast to a well known address if you prefer) consisting of a list of one or more 2-tuples, each being an integer priority value and a MAC address (e.g. {{2, aa:bb:cc:dd:ee:ff}, {1, 11:22:33:44:55:66}}
); the list is always sorted in order of decreasing priority value, tie-broken on MAC address.weaver
instance has an integer ALONE
priority valueweaver
instance, theALONE
protocol can be in state SLAVE
or MASTER
weaver
is initialised, it's in state SLAVE
.SLAVE
, weaver
will not forward to or from its pcap
interface any packets save for ALONE
packets, instead discarding them.weaver
instance ever seeks to forward an ALONE
packet to or from its pcap
interface, it first performs determines whether its MAC address is already in the list.MASTER
or SLAVE
)ALONE
packet another participant should shut down). If not, it enters SLAVE
mode and drops the packet (this means there is a loop, but it should be slave), and resets a timer.ALONE
packet consisting only of its priority and MAC addressMASTER
mode, concluding any previously existing loop has gone.Of course if there is already some shortest path algorithm that could be piggybacked, you could simply assume (if the protocol is switched on) an adjacency between every pcap
interface in the system until something like the above proves otherwise.
there is a loop (not merely duplication)
Got it. There's duplication because both peers inject onto the VLAN, and there is a loop because both peers will capture the packets the other injected.
restart
weaver
when going from master to slave or vice versa. [...] the topology would have to be rebuilt.
Right, and for the slave->master transition the restart, and associated topology recalculation, happens at the worst possible time.
Btw, I suspect that clearing weave's MAC cache when a peer is passivated isn't enough. Other peers will have MAC cache entries referencing that peer. Bouncing the peer would help here, since that will clear out those entries in most cases, especially in a fully connected mesh.
STP
That's a lot of work.
Btw, I suspect that clearing weave's MAC cache when a peer is passivated isn't enough.
Yeah I thought that might be the case. I sort of need to flood a 'forget this MAC' packet.
STP / STP-a-like
That's a lot of work.
STP-a-like - yes indeed. But not as much as doing the 'real' standards compliant fix. The 'real' fix would be treating the entire weave network as a bridge (i.e. a collection of half bridges), and running STP, then realising you need RSTP, deciding that didn't work because VLANs, then doing PVST/PVST+, throwing your hands up in horror as to the vendor interoperability and scaling issues, deciding TRILL is the solution, then finding the IS-IS component of that alone is 10 times the size of weave, and going back to a proprietary protocol. YMMV.
I sort of need to flood a 'forget this MAC' packet.
Well, as I alluded to, peers do clear out entries from their MAC caches that refer to peers which are no longer part of the network. That's why bouncing a peer on the active->passive transition would work.
Perhaps this is less of an issue than I thought though. Given that what you are attempting to achieve here is redundancy, the active->passive transition would actually never occur. You'd have a peer failure, which does cause the MAC caches to clear as described above.
@abligh in your example, how are peerA and peerB connected to the VLAN? Could you have just one of them connected at a time.
@rade typically they'd be listening on the VLAN device or (more usefully here) a veth
which shared a bridge with the VLAN device. Selectively disconnecting the slave from the VLAN is what I meant by:
Leave both copies of weaver running normally but attempt to use some
ebtables
magic to cut off connectivity on the slave. Issues: MAC cache is not cleared; experimentation required to determine whetherebtables
can be persuaded to block bidirectionally.
Apologies for being cryptic.
I have a situation where I am using weaver to bridge a weave network onto a real VLAN, and I want to do so redundantly. I already have two 'bastion hosts' which negotiate master/slave configuration. The network looks like this (with apologies for markdown graphics):
(For simplicity and to avoid markdown madness, I have not shown all links within the full mesh of peers)
The idea here is that the addresses of
peerA
andpeerB
are both given to to peers W, X, Y and Z. Together these 6 peers form a weave network.peerA
andpeerB
, however, are attached to another network (let's say for the sake of argument a VLAN) on which are also locatedHost1
andHost2
. The idea is to allow for L2 connectivity between containersW
,X
,Y
,Z
and physical hosts Host1 and Host2, using a redundant pair of gatewaysA
andB
(hostingpeerA
andpeerB
).The problem with this setup is as follows. If a broadcast packet egresses from (e.g.)
W
, it will be transmitted (directly or indirectly) topeerA
, which will emit the packet towards the VLAN so it can reachHost1
andHost2
.peerB
will also receive the packet, and transmit the packet on, inter alia topeerA
, which will repeat the process, causing a packet loop (remember this is L2, so no hop counts). Similarly,peerB
will loop packets in the other direction. Removing the peering betweenpeerA
andpeerB
does not help as Weave will simply transmit the looping packets viaX
,Y
orZ
, as weave does not require a full mesh. Similarly, a broadcast packet received bypeerA
fromHost1
will be transmitted inter alia topeerB
, where it will be retransmitted onto the VLAN, for it to be received again bypeerA
; again, removing the direct peering does not help.As
peerA
andpeerB
already negotiate a master/slave relationship between them, one possibility is to only runweaver
on the master (i.e. onpeerA
, orpeerB
, but not both). Whilst this solves the packet loop problem, it is a poor solution in terms of failover. The most significant issue is time to failover. AssumepeerA
is the master andpeerB
the slave, andpeerA
fails, meaningpeerB
is elected master. As peersW
,X
,Y
andZ
will have hadpeerB
's peer data for a long while and failed to contact it (as it will not have been runningweaver
whilst a slave),peerB
is unlikely to be contacted byW
,X
,Y
, orZ
for a relatively long period of time; of coursepeerB
may initiate contact, but ifW
,X
,Y
andZ
are behind a NAT (as is likely in my scenario), such contact will fail; it will requireW
,X
,Y
, orZ
to initiate contact topeerB
which may take several minutes (if I've understood how timeouts work). When a peering is established, weave needs to update its internal topology, which may also take time. A second disadvantage is that in the normal condition (wherepeerA
is master), toW
,X
,Y
andZ
,peerB
appears dead, meaning there is no way to know whether failover will work unless and untilpeerA
actually dies.A better option would be to run the slave peer (
peerB
) here in a slave mode. In slave mode, theweaver
process would listen on thepcap
interface, but discard incoming packets (on the assumption the master would handle them). It would thus learn nothing through the incoming pcap interface, and it would transmit nothing (not even unlearnt traffic, e.g. broadcasts or unknown MACs). In effect, thepcap
interface would be 'switched off'. When transitioning to master, thepcap
interface would be switched on. When transitioning to slave, thepcap
interface would be switched off again, and (ideally) all the learnt data in weave's equivalent of a distributed CAM table would be 'forgotten'. The master/slave status could be initiated through a command-line option and changed in real-time through the JSON interface.If this is a good idea, I am happy to code this up. The
pcap
bit seems easy enough. I'm not, however, sure how I might go about persuading weave to 'forget' where MAC addresses are on the transition from slave to master.