Closed ericb-summit closed 3 years ago
I believe there might be an issue in how vifi_t check_vif(struct uvif *v)
compares subnets.
I added some logs:
17:47:18.815 Getting vifs from kernel interfaces
17:47:18.815 Getting vifs from /home/ubuntu/mrouted/config/mrouted.conf
17:47:18.815 installing tunnel from 10.0.204.71 to 10.0.204.106 as vif #0 - rate=0
17:47:18.815 numvifs=1
17:47:18.815 Correlating interfaces and configuration ...
17:47:18.815 warning - ignoring ens5 lcl_addr 5ac8000a subnet e0ffffff mask 0x40c8000a (flags 0x400000), same subnet as eth1 subnet 00000000 mask 0x00000000 (flags 0x204001)
17:47:18.815 Skipping eth1, disabled
17:47:18.815 Skipping docker0, disabled
17:47:18.815 warning - ignoring eth1 lcl_addr 47cc000a subnet 00000000 mask 0x00000000 (flags 0x204001), same subnet as eth1 subnet 00000000 mask 0x00000000 (flags 0x204001)
17:47:18.815 Cannot forward: only one enabled vif
As eth1 is a tunnel it has no subnet or mask. The comparison is completely meaningless, IMO. I think the loop should ignore tunnels when comparing subnets. This is the diff over 4.3.
--- a/src/config.c
+++ b/src/config.c
@@ -137,6 +137,10 @@ static vifi_t check_vif(struct uvif *v)
vifi_t vifi;
UVIF_FOREACH(vifi, uv) {
+ if (uv->uv_flags & VIFF_TUNNEL) {
+ continue;
+ }
+
if (v->uv_flags & VIFF_DISABLED) {
logit(LOG_DEBUG, 0, "Skipping %s, disabled", v->uv_name);
return NO_VIF;
Now, it actually starts
17:59:22.447 mrouted version 4.3 starting
17:59:22.448 Got 262144 byte recv buffer size in 0 iterations
17:59:22.448 registering icmp socket fd 4
17:59:22.448 Getting vifs from kernel interfaces
17:59:22.448 Getting vifs from /home/ubuntu/mrouted/config/mrouted.conf
17:59:22.448 Found interface for 10.0.204.71: eth1 lcl_addr 0x47cc000a, subnet 0x40cc000a, subnet_mask 0xe0ffffff
17:59:22.448 installing tunnel from 10.0.204.71 to 10.0.204.106 as vif #0 - rate=0
17:59:22.448 numvifs=1
17:59:22.448 Correlating interfaces and configuration ...
17:59:22.448 Installing ens5 (10.0.200.90 on subnet 10.0.200.64/27) as VIF #1, rate 0 pps
17:59:22.448 Skipping eth1, disabled
17:59:22.448 Skipping docker0, disabled
17:59:22.448 warning - ignoring eth1 lcl_addr 47cc000a subnet 00000000 mask 0x00000000 (flags 0x204001), same subnet as ens5 subnet 40c8000a mask 0xe0ffffff (flags 0x400000)
17:59:22.448 Installing vifs in mrouted ...
17:59:22.448 vif #0, tunnel 10.0.204.71 -> 10.0.204.106
17:59:22.448 SENT neighbor probe from 10.0.204.71 to 10.0.204.106
17:59:22.448 vif #1, phyint 10.0.200.90
17:59:22.448 0.0.0.0 advertises new route 10.0.200.64/27
17:59:22.448 0.0.0.0 advertises 10.0.200.64/27 with adj_metric 1 (ours was 32)
17:59:22.448 Assuming querier duties on vif 1
17:59:22.448 Sending v2 query on ens5
17:59:22.448 SENT membership query from 10.0.200.90 to 224.0.0.1
17:59:22.448 SENT neighbor probe from 10.0.200.90 to 224.0.0.4
17:59:22.448 Binding IPC socket to /var/run/mrouted.sock
17:59:22.448 mrouted version 4.3
17:59:22.448 Installing vifs in kernel ...
17:59:22.448 vif #0, tunnel 10.0.204.71 -> 10.0.204.106
17:59:22.449 vif #1, phyint 10.0.200.90
17:59:22.449 RECV membership query from 10.0.200.90 to 224.0.0.1
17:59:22.449 Ignoring query from 10.0.200.90; querier on vif 1 is still me
17:59:22.459 RECV v2 member report from 10.0.200.90 to 224.0.0.4
17:59:22.459 224.0.0.4 LAN scoped group, skipping.
17:59:22.459 RECV v2 member report from 10.0.200.90 to 224.0.0.2
17:59:22.459 224.0.0.2 LAN scoped group, skipping.
17:59:22.459 RECV v2 member report from 10.0.200.90 to 224.0.0.22
17:59:22.459 224.0.0.22 LAN scoped group, skipping.
It's just 4 LOC but LMK if this warrants A PR.
Never mind, I had more changes insided cfparse.y that were masked because the generated Makefile doesn't seem to rebuild cfparse.c when the .y changes. The change is more complex than what I pasted above. I'll just open a PR when I'm sure.
Hi, and sorry for the late reply! It seems the ipip tunnel support broke somewhere in the 4.x series, don't know when exactly. I didn't really expect anyone to still be using that, and as you noticed, it's gone completely untested. Sorry about that.
I'd love to see what else you did to make it work. I started playing around today to see what needs fixing, adding a test case and importing the fix suggested by you. No luck yet, but it seems to be related to the DVMRP probes not getting through the tunnel.
Hi, thanks for the follow up. I suspected it had to be an untested path. I'll send you a PR of what I had to do to get it to run. But even so, I still couldn't get it to relay IGMP2 and multicast between two subnets, even if I set up a multicast-capable GRE tunnel between the two routers (and thus, avoid use of ipip).
I mean the use case appears to be the textbook scenario: two multicast capable subnets that can speak through unicast-only routers. But I could never get the group memberships from one side (visible in that mrouted) to appear in the other mrouted, so of course the kernel never forwarded the relevant multicast. I suspect I'm missing something.
Re: PR. Thanks, much appreciated!
Yeah there's something else going on that's broken. DVRMP is a bit clunky, it carries RIP built-in, but I can't see my two routers exchanging routes for their respective LANs over the tunnel. That, and the fact that I almost forgot disabling rp_filter, causes the following in my log (notice parent-vif:-1), the multicast is properly tunneled to the LHR:
NS2: 00:16:38.756 Add cache entry (10.0.0.10 225.1.2.3) gm:2, parent-vif:0
NS4: 00:16:38.756 Skipping mfc entry for (10.0.0.10, 225.1.2.3), no inbound vif (no reverse path).
NS4: 00:16:38.756 Add cache entry (10.0.0.10 225.1.2.3) gm:0, parent-vif:-1
Turns out the LHR doesn't know about the reverse-path back to 10.0.0.10, so it cannot figure out the inbound VIF, and thus not set up the correct routing entry.
I'll debug this more tomorrow after $DAYJOB.
There, did a minor follow-up in 3ac830a to your changes, followed by the actual fix in 57326f9, which turned out to be a 20 year old regression. Hope everything works on your end (remember rp_filter and TTLs), good luck and please let me know how it goes! :)
Here are my results using ipip; on the "B" side of the tunnel I run iperf in server mode and join 226.94.1.10, I see this:
ubuntu@ip-10-0-201-215:~$ docker exec -it mrouted /bin/bash -c "mroutectl sh -d && mroutectl sh igmp -d"
Interface Table
Address Interface State Cost TTL Uptime Flags
10.0.201.215 ens5 Up 1 1 0:00:00 QL
10.0.204.122 eth1 Up 1 1 0:00:00
Neighbor Table
Neighbor Interface Version Flags Uptime Expire
10.0.204.88 eth1 3.255 G 0:00:27 16s
DVMRP Routing Table
Origin Neighbor Interface Cost Expire
10.0.200.64/27 10.0.204.88 eth1 2 48s
10.0.201.128/25 Local ens5 1 50s
Multicast Forwarding Cache Table
Origin Group Inbound <> Uptime Expire Outbound
10.0.201.128/25 226.94.1.10 ens5 0:00:09 0:04:55 eth1
Source Group Inbound Uptime Packets Bytes
10.0.201.215 226.94.1.10 ens5 0:00:09 2 64
IGMP Interface Table
Interface Querier Version Groups Expire
ens5 Local 2 1 Never
eth1 Local 3 0 Never
IGMP Group Table
Interface Group Last Reporter Expire Flags
ens5 226.94.1.10 10.0.201.215 244s v2
On the "A" side I'd expect to see the replicated group membership, but I don't:
ubuntu@ip-10-0-200-76:~$ docker exec -it mrouted /bin/bash -c "mroutectl sh -d && mroutectl sh igmp -d"
Interface Table
Address Interface State Cost TTL Uptime Flags
10.0.200.76 ens5 Up 1 1 0:00:00 L
10.0.204.88 eth1 Up 1 1 0:00:00
Neighbor Table
Neighbor Interface Version Flags Uptime Expire
10.0.204.122 eth1 3.255 G 0:01:09 20s
DVMRP Routing Table
Origin Neighbor Interface Cost Expire
10.0.200.64/27 Local ens5 1 120s
10.0.201.128/25 10.0.204.122 eth1 2 78s
IGMP Interface Table
Interface Querier Version Groups Expire
ens5 0.0.0.0 2 0 161s
eth1 Local 3 0 Never
At the moment i join the IGMP group I see, on B:
mrouted | mrouted: 19:18:35.341 Aging forwarding cache entries
mrouted | mrouted: 19:18:35.522 RECV v2 member report from 10.0.201.215 to 226.94.1.10
mrouted | mrouted: 19:18:35.522 Accepting group membership report: src 10.0.201.215, dst 226.94.1.10, grp 226.94.1.10
mrouted | mrouted: 19:18:35.522 IGMP v2 compatibility mode for group 226.94.1.10
mrouted | mrouted: 19:18:35.522 Group 226.94.1.10 joined on vif 0
mrouted | mrouted: 19:18:35.522 Add cache entry (10.0.201.215 226.94.1.10) gm:2, parent-vif:0
mrouted | mrouted: 19:18:36.523 Aging forwarding cache entries
But nothing on A side. I'd expect the group memberships to be visible on the A side somehow.
I get similar behaviour if swap the ipip tunnel for a gre tunnel. So I guess this is a partial win in that I seem to have the same behaviour for ipip and phyint. But still can't get the group memeberships to be visible on the remote side.
Here are my sysctls. And of course ipip loaded if needed.
net.ipv4.conf.all.rp_filter=0 net.ipv4.conf.all.force_igmp_version=2 net.core.wmem_max=67108864 net.core.rmem_max=67108864 net.ipv4.udp_mem="262144 327680 393216" net.core.netdev_max_backlog=2000
I also attach my logs from when I ran with gre.
mcast-receiver.txt
mcast-sender.txt
And the interface details mcast-receiver-ifs.txt mcast-sender-ifs.txt
Thanks again.
Hmm, OK there are a few things that need straightening out here. IGMP is the OSI layer-2 protocol for the LAN, it is link local and not routed. DVMRP is the mrouted (OSI layer-3) protocol and is a "flood and prune" multicast routing protocol. Hence, on each LAN the respective mrouted sends IGMP queries and end-devices send IGMP joins (membership reports) to inform the router of what they'd like to receive. When a router has incoming multicast (non-control traffic) data, it floods it to all routers it has heard DVMRP probe messages from (and successfully peered with), including tunnel interfaces. On reception at the last-hop router the kernel checks its rp_filter setting, and if it's OK, passes information to the mrouted process of available ingressing multicast. mrouted then installs the kernel route based on the (S,G) + inbound VIF (derived from the source ip (S) from the multicast data). I.e., you will not see IGMP reports being "proxied" to the other end of the tunnel.
The test/tunnel.sh
script uses my own mping tool instead of iperf to test things. I'll have to set things up manually tomorrow to see if I can replicate your result, but from the limited logs it looks like mrouted is started before you have IP addresses on interfaces. The IP address 0.0.0.0 is repeated in places ... maybe you can try with the --startup-delay=SEC
option to see if that helps?
Thanks, that helps. Also I'm not a SME , so I might be using the wrong terms. What I was expecting was that if multicast traffic is visible on A side, and B side has a group membership matching that traffic, that mrouted along with the kernel would forward multicast to the B side.
Specific to my scenario, the interfaces were already up and up long before I started mrouted. But I'll use that 0.0.0.0 as a starting point. Thanks.
Sorry if it came out cranky, just wanted to give you the cliff notes. It should work as you describe it.
Could you post the exact iperf commands you used?
Also, the sender needs to use a TTL value >1 it defaults to 1.
Yep, here's the sender, TTL sshould be 32:
docker run --rm -it --network=host msherman42/iperf2 -p 12000 -c 226.94.1.1 -u -T 32 -t 300 -i 1
And the receiver
docker run --rm -it --network=host msherman42/iperf2 -p 12000 -s -u -B 226.94.1.1 -i 1
With network=host, it should behave pretty much like it's running on the host OS. If I run these commands on the same lan segment, or, on a routed network that supports multicast, I see traffic as expected.
Good morning!
Took me a while to set up a test bench, and I think I've reproduced your issue. In the below image I run mrouted only on n1
and n3
.
Weirdly enough it works fine with mping, but when I run iperf (native install, not docker) it doesn't work. Still have to analyze what's going on here, but unfortunately I've got to get back to $DAYJOB now.
I've double checked normal operation, with mrouted running also on n2
, i.e. without the tunnel setup. In that scenario iperf works fine. Which is all the more confusing.
Ah, there it is -- MTU -- the packets sent by iperf are much bigger¹ than the ones sent by mping! :smile:
If I reduce the size, using -l 1400
on the sender/client, it works fine.
¹ with the Don't Fragment bit set in the IP header. My interfaces have the default MTU (1500) and the total frame size was 1498, so with an extra IP header added by the IPIP tunnel, we get too large frames, which we're (the kernel is) also not allowed to fragment.
Good morning from Canada. Let me check my issue isn't something as simple as MTU also.
Whoa, that CORE tool looks really interesting.
Core is super awesome!😎👍 Something of a hidden gem.
humor me and share your mping command
Sure thing!
Sender:
mping -s -i eth0 -t 32 225.1.2.3
Receiver:
mping -r -i eth0 -t 32 225.1.2.3
Note: mping uses the receiver end to send the packets back to the sender. Otherwise you can use plain ping, socat, and tcpdump to achieve basically the same thing. I also have a visual tool called mcjoin which can also be useful when testing multicast connectivity.
Any progress? Don't want to rush you or anything, just want to see if I should start up the release machinery :)
Hi again. I have several different applications for mrouted. In my immediate test, after much testing, I believe that the B side switching fabric doesn't pass broadcast/multicast, so we're kinda SOL. But in the other scenario, which I need to address anyways, multicast is supposed to work fine.
I will switch to that scenario right now and report back. Thanks again.
Aha, that's too bad. If you have access to configure the switches, many vendors have multicast-router-ports or similar that can be set up to flood (unknown/unregistered) multicast on certain ports. But that's not available, and the switches have IGMP snooping enabled, you could try SMCRoute instead, and set up static multicast routes, including "joining" the groups so the switches open up layer-2 forwarding towards the router. Provided you know beforehand the groups you want to forward. It doesn't have tunneling support built-in, but you can use GRE for that.
I am still working on it by the way.
B1 joins the multicast group using iperf
(tcpdump on B1 shows igmpv2 report). But B2 does not see the igmp report (tcpdump -nn -i any igmp
). mroutectl
similarly does not see the group membership. Is it fair to say that for this scenario to work the switching fabric must pass igmp so mrouted can see it?
EDIT - re-read your above comment more carefully, and I see it's a definite yes.
Anyways, I'm working up setting up gre tunnels between the hosts so I can eliminate the switching fabric as a factor. I don't know that this solution would work at scale, but it might in select scenarios.
Ok so even with the GRE tunnels my results are not promising. In short I feel this won't work on this target network due to the switching fabric getting in the way. As I doubt they--AWS--will change it, hehe, I think I'm better with going down SMCRoute to maniuplate the mrouting table. I have a unicast fallback plan, but I still haven't completely given up on multicast.
In closing I hope this wasn't a complete waste as we (mostly you) fixed the ipip tunnel support. Let me know if there is anything you need from me at this point. I thank you once more for your assistance.
FYI, I published this https://hub.docker.com/repository/docker/summittech/mping
using this Dockerfile.
FROM ubuntu:bionic as builder
RUN apt-get update
RUN apt-get install -y build-essential flex bison autoconf
RUN apt-get install -y git
COPY mrouted ./mrouted
RUN cd mrouted && ./autogen.sh && ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
RUN cd mrouted && make && make DESTDIR=/tmp/mrouted install
FROM ubuntu:bionic
COPY --from=builder /tmp/mrouted /
Useful if you're in a pinch and need mping without installing a full toolchain.
Hi, first of all, this has definitely not been a waste of time for me! Fixing the ipip support, and getting first-hand information about a very interesting use-case has been great, thank you!
Looking at the topology image, I agree with you. SMCRoute probably fits your needs better here. You need to:
mrouted can actually also fake a join, but only on B2. What you need is to get the multicast data in LAN A, and for that there's not mechanism currently in mrouted that can cheat the switch in LAN A.
The root cause here is that mrouted is really built for running on the routers, that because the IGMP Querier election algorithm will chose the .1 in each LAN as the multicast distribution node. So even though A2 and B2 send queries, signaling Hey I can route multicast, they will both lose the election and the IGMP snooping/filtering mechanism in the switches will just ignore A2 and B2.
Thanks for the idea of a docker image! I assume it was a cut-n-paste error above, so I made my own using alpine and published it on ghcr.io instead, https://github.com/troglobit/mping/pkgs/container/mping :)
I typically go alpine route also. But ran into an issues doing so with mrouted when the containerized OS didn't strictly match the host. (e.g.: debian buster container, ubuntu bionic host = segfault). Probably kernel-headers are different. Didn't really have time to build -g and gcc into it. I guess it's to be expected?
About smcroute, if I may pick your brain one more time; I set up the relevant scenario, I had indeed set up the config you suggested above. On A2:
mgroup from ens5 group 226.94.1.1
mroute from ens5 group 226.94.1.1 to gre1
On B2:
mroute from gre1 group 226.94.1.1 to gre2
Now, when I mping -s -t 32 -p 12000
from A1:
mping -r ...
)$ sudo tcpdump -nn -i gre1 -vv -X
...
14:50:52.758757 IP (tos 0x0, ttl 31, id 30475, offset 0, flags [DF], proto UDP (17), length 84)
10.0.200.6.12000 > 226.94.1.1.12000: [udp sum ok] UDP, length 56
0x0000: 4500 0054 770b 4000 1f11 2f28 0a00 c806 E..Tw.@.../(....
0x0010: e25e 0101 2ee0 2ee0 0040 55e3 312e 3500 .^.......@U.1.5.
0x0020: 7320 0000 0a00 c806 e25e 0101 0000 0019 s........^......
0x0030: 0100 0000 617c 0a4c 0000 0000 000b 9ac2 ....a|.L........
0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0050: 0000 0000 ....
I ran a mping -r -i gre1 -p 12000 226.94.1.1
on B2, as a quick test, and although the multicast is visible on gre1, mping on B2 doesn't seem to receive it. Also, the traffic isn't forwarded on gre2.
On A2, mroutectl:
(*,G) Template Rules
ROUTE (S,G) IIF OIFS
(*, 226.94.1.1) ens5 gre1
Kernel MFC Table
ROUTE (S,G) IIF OIFS
(10.0.200.6, 226.94.1.1) ens5 gre1
(10.0.200.13, 226.94.1.1) ens5 gre1
and ip mr sh
:
(10.0.200.6, 226.94.1.1) Iif: ens5 Oifs: gre1 State: resolved
(10.0.200.13, 226.94.1.1) Iif: ens5 Oifs: gre1 State: resolved
On B2:
(*,G) Template Rules
ROUTE (S,G) IIF OIFS
(*, 226.94.1.1) gre1 gre2
ip mr sh
is empty. Which explains why the traffic isn't forwarded. The debug output of either smcroute shows nothing of interest. Attached both.
smcroute-A2.txt smcroute-B2.txt
PS - I use a gre tunnel between B1 and B2 because the switching fabric blocks multicast of any kind.
First of all, the ABI in Linux is fixed, so there shouldn't be any segfaults running an mrouted container based on Alpine Linux on a Debian/Ubuntu host. Thank you for mentioning this, I'll look into it!
Regarding the other issue, I'm gonna go out on a limb and say it's caused by rp_filter
on B2. Try sudo sysctl -w net.ipv4.conf.all.rp_filter=0
on B2.
That's probably also why running the mping receiver/reflector on B2 doesn't work.
(I should think before writing ... Docker containers use network namespaces, and an interface in Linux cannot be in two places at the same time, so running mrouted in a container will not work, unless you move the interfaces it should operate on to the container, which I guess i never what you really want ...)
Closing, v4.4 was just released with fixes for this issue. Remaining work, to solve the actual use-case detailed here, is tracked in #54.
Thank you @ericb-summit for reporting this and helping out resolving it!
Closing, v4.4 was just released with fixes for this issue. Remaining work, to solve the actual use-case detailed here, is tracked in #54.
Thank you @ericb-summit for reporting this and helping out resolving it!
Thank you for the support
I have 3 mcast enabled interfaces on my system; eth0, eth1, and for sake of accuracy, docker0 (I do not intend to use docker0 for anything).
When I run with no config, it starts as expected
As I am attempting to use mrouted to tunnel multicast between two networks where the routers can speak to each other unicast but not multicast, I add a tunnel statement. The config is completely empty except for this one line:
tunnel 10.0.204.68 10.0.204.119 metric 1 threshold 1 rate-limit 0
I run it, I see this:
And it exits.
To be sure it wasn't something wonky with my binary, I rebuilt 4.3 from scratch inside alpine latest per your instructions. Then I tried debian stretch, then I dropped docker and just built in a AMI linux 2 instance. Same exact behaviour. My next step is to mark up the code to try to undertstand what is going on.
In the meantime, could you kindly comment if am I doing something very obviously wrong? Why do I see VIF#0 repeatedly? Why would installing the interface fail silently? Thanks.