pali / igmpproxy

IGMP multicast routing daemon
Other
148 stars 78 forks source link

igmpproxy sometimes removes the mroute if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message #95

Open tonusoo opened 1 year ago

tonusoo commented 1 year ago

I have noticed that occasionally igmpproxy removes the multicast route if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message. For example, please see the packet capture below:

root@r1:~# tshark -Y 'frame.number in {25..27}' -r br0_igmp.pcap
Running as user "root" and group "root". This could be dangerous.
   25 190.655057  192.168.0.1 → 224.0.0.1    IGMPv2 46 Membership Query, general
   26 196.659055  192.168.0.1 → 224.0.0.1    IGMPv2 46 Membership Query, general
   27 200.665434 192.168.0.12 → 239.3.1.106  IGMPv2 60 Membership Report group 239.3.1.106
root@r1:~#

This packet capture was made on a LAN-facing interface named br0 in a router running igmpproxy. As seen above, the "Membership Report" message was received 10.010377 seconds after the initial "Membership Query" message while the "Max Resp Time" value in the "Membership Query" messages was 10.0 seconds. This caused igmpproxy to remove the multicast route:

Sent membership query from 192.168.0.1 ifIndex 11 to 224.0.0.1. Delay: 10
Created timeout 11 (#0) - delay 10 secs
(Id:11, Time:10)
Created timeout 12 (#1) - delay 115 secs
(Id:11, Time:10)
(Id:12, Time:115)
About to call timeout 11 (#0)
Aging routes in table.
Removing group 239.3.1.106. Died of old age.
Removed route entry for 239.3.1.106 from table.
Vif bits : 0x00000001
Setting TTL for Vif 0 to 1
Removing MFC: 10.0.2.28 -> 239.3.1.106, InpVIf: 1

One might think that this is perhaps an expected behavior as "Membership Report" message was received later than the "Max Resp Time" of 10.0 seconds, but according to rfc2236 Section 3 the host receiving the "General Membership Query" is allowed to answer at 10.0th second and when adding network/processing delays, then it is IMHO expected that occasionally the "Membership Report" messages will arrive tiny bit later than 10.0 seconds.

In order to troubleshoot this better, I built a small lab with following topology:

h3[ens3] <-> [ens3]gw1[ens4] <-> [ens3]mr1

h3 is a virtual machine in multicast receiver role running socat -u UDP4-RECV:10,ip-add-membership=239.3.5.3:ens3 -. gw1 is a multicast router running igmpproxy version 0.4 with configuration below:

quickleave
phyint ens4 upstream  ratelimit 0  threshold 1
    altnet 10.0.0.0/8
phyint ens3 downstream  ratelimit 0  threshold 1

mr1 is in the multicast sender role and running send(IP(src='10.0.2.28', dst='239.3.5.3')/UDP(sport=10, dport=10), iface='ens3', inter=1./2, loop=1) in scapy which sends UDP datagrams addressed to 239.3.5.3 twice a second via ens3 interface in an endless loop. All three virtual machines are running Debian 10.

I have observed the similar issue on this lab setup. One occurrence can be seen on the screenshot below: igmpproxy_mroute_died_of_old_age

Upper tmux panes are h3 running tcpdump and socat, respectively. Middle panes are src/igmpproxy -d -vv /etc/igmpproxy.conf 2>&1 piped through timestamp input utility and custom packet capture utility. Lower right pane is running the scapy command mentioned above in mr1 and lower left pane is printing a timestamp in gw1 if the multicast route is missing:

while true; do ip mroute | grep -q '(10.0.2.28,239.3.5.3) *Iif: ens4 *Oifs: ens3  State: resolved' || date +'%T.%3N'; done

As highlighted on the screenshot, the multicast route was removed 0.9 seconds before the "Membership Report" message was received. The "Membership Report" message was received 10.153 seconds after the initial "Memberhip Query". However, I have seen this happen even when the delay between the query and the report has been 9.6 seconds. In addition, the multicast route is not removed each time when the delay is >10.0 seconds.

In order to avoid the removal of the multicast route, I relaxed the pselect() timeout:

root@gw1:~/igmpproxy# diff -u src/igmpproxy.c{~,}
--- src/igmpproxy.c~    2023-05-05 01:09:08.551415847 +0300
+++ src/igmpproxy.c     2023-05-05 01:09:27.418989595 +0300
@@ -296,7 +296,7 @@
     int     MaxFD, Rt, secs;
     fd_set  ReadFDS;
     socklen_t dummy = 0;
-    struct  timespec  curtime, lasttime, difftime, tv;
+    struct  timespec  curtime, lasttime, difftime, tv, pselect_timeout;
     // The timeout is a pointer in order to set it to NULL if nessecary.
     struct  timespec  *timeout = &tv;

@@ -339,8 +339,12 @@
         FD_ZERO( &ReadFDS );
         FD_SET( MRouterFD, &ReadFDS );

+       memcpy(&pselect_timeout, timeout, sizeof(struct timespec));
+       pselect_timeout.tv_sec *= 2;
+       my_log(LOG_DEBUG, 0, "Calling pselect() with %ld.%09ld seconds timeout", pselect_timeout.tv_sec, pselect_timeout.tv_nsec);
+
         // wait for input
-        Rt = pselect( MaxFD +1, &ReadFDS, NULL, NULL, timeout, NULL );
+        Rt = pselect( MaxFD +1, &ReadFDS, NULL, NULL, &pselect_timeout, NULL );

         // log and ignore failures
         if( Rt < 0 ) {
root@gw1:~/igmpproxy#

Alternatively, at least in a lab environment, one could send frequent "Membership Report" messages from the multicast receiver host, e.g send(IP(dst="239.3.5.3")/IGMP(type=0x16, gaddr="239.3.5.3"), iface="ens3", inter=1./1, loop=1) in scapy.

Has anyone else observed this behavior? Please let me know if I can provide additional details.

pali commented 1 year ago

This is an interesting issue. It looks like that you are right that we should wait more than 10s for reply.

But your approach with fixup is wrong. pselect timeout is always maximally 3s and it is max sleep timeout for waiting for some packet. So if some other host send igmp packet after 10th second but before the host for which we are waiting then we still age route.

What is needed is to increase 10s wait timeout for reply to some higher value, not the sleep timeout when we waiting for any packet.

tonusoo commented 1 year ago

So if some other host send igmp packet after 10th second but before the host for which we are waiting then we still age route.

I overlooked the fact that there can be more than one multicast receiver at the same time. However, out of curiosity, I tried to create a situation where some other host sent an IGMP "Membership Report" message after the 10th second, but before the host for which we are waiting for. I ran a small script in h3(multicast receiver host), which started to listen for IGMP "Membership Report" messages addressed to 239.3.5.3 when IGMPv2 "General Membership Query" message from gw1(multicast router) was received. If the OS network stack had not sent the IGMP "Membership Report" message to 239.3.5.3 within 10 seconds, then the script sent IGMP "Membership Report" message to 239.3.5.1 address instead acting as some other host.

Custom packet capture script mentioned in my previous post was also updated in a way that it is able to show IGMPv2 "Membership Report" messages to 239.3.5.1 address.

Now during a 22 hours long time period there were 42 occurrences where igmpproxy with relaxed pselect() timeout received IGMP "Membership Report" message from another host . Example:

21:45:26.753845 - 0.000 - IGMPv2 General Memberhip Query: 192.168.0.1 > 224.0.0.1: Max Resp Time: 10.00 s
21:45:36.782432 - NA - IGMPv2 Memberhip Report: 192.168.0.50 > 239.3.5.1: Group Address: 239.3.5.1
21:45:36.853614 - 10.100 - IGMPv2 Memberhip Report: 192.168.0.100 > 239.3.5.3: Group Address: 239.3.5.3

As seen above, the "Membership Report" message to 239.3.5.1 from 192.168.0.50 sent by the script was received 10.028587 seconds after the "General Membership Query" message was sent. "Membership Report" message to 239.3.5.3 from 192.168.0.100 was received 0.071182 seconds later.

None of those 42 occurrences removed the multicast route. Screenshot with another such occurrence can be seen below: IGMPv2_MR_to_239-3-5-1

I'm not arguing that it's a good idea to keep using the relaxed pselect() timeout patch. However, at least based on my tests, the multicast route never aged out.

What is needed is to increase 10s wait timeout for reply to some higher value, not the sleep timeout when we waiting for any packet.

I think it's better if you create a patch yourself as you understand the source code much better than I do. I'm happy to test it out.