troglobit / pimd

PIM-SM/SSM multicast routing for UNIX and Linux
http://troglobit.com/projects/pimd/
BSD 3-Clause "New" or "Revised" License
197 stars 87 forks source link

pimd does not receive PIMv2 Hello #49

Closed greenpau closed 7 years ago

greenpau commented 9 years ago

I have a point to point TUN/TAP interface.

While sniffing, I see that my two hosts 10.252.93.0 and 10.252.63.0 receive each other PIMv2 Hello packets.

[root@ip-192-168-16-146 ~]# tshark -i flannel0
Running as user "root" and group "root". This could be dangerous.
Capturing on 'flannel0'
  1   0.000000  10.252.63.0 -> 224.0.0.13   IGMPv2 32 Membership Report group 224.0.0.13
  2   0.480001  10.252.63.0 -> 224.0.0.22   IGMPv2 32 Membership Report group 224.0.0.22
2   3   5.055051  10.252.93.0 -> 224.0.0.13   PIMv2 30 Hello
3   4   9.494298  10.252.63.0 -> 224.0.0.13   PIMv2 30 Hello
4   5  35.085053  10.252.93.0 -> 224.0.0.13   PIMv2 30 Hello
  6  35.085065  10.252.93.0 -> 224.0.0.1    IGMPv2 32 Membership Query, general
6   7  36.865825  10.252.93.0 -> 224.0.0.22   IGMPv2 32 Membership Report group 224.0.0.22
7   8  39.329709  10.252.93.0 -> 224.0.0.2    IGMPv2 32 Membership Report group 224.0.0.2
  9  39.525122  10.252.63.0 -> 224.0.0.13   PIMv2 30 Hello
9  10  39.525188  10.252.63.0 -> 224.0.0.1    IGMPv2 32 Membership Query, general
10  11  44.593635  10.252.93.0 -> 224.0.0.13   IGMPv2 32 Membership Report group 224.0.0.13
11  12  47.552000  10.252.63.0 -> 224.0.0.2    IGMPv2 32 Membership Report group 224.0.0.2
12  13  48.448001  10.252.63.0 -> 224.0.0.22   IGMPv2 32 Membership Report group 224.0.0.22
 14  48.543999  10.252.63.0 -> 224.0.0.13   IGMPv2 32 Membership Report group 224.0.0.13
14  15  65.615259  10.252.93.0 -> 224.0.0.13   PIMv2 30 Hello
15

Relevant interfaces:

[root@ip-192-168-16-146 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP qlen 1000
    link/ether 0a:14:58:92:4d:6f brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.146/24 brd 192.168.16.255 scope global dynamic eth0
       valid_lft 3566sec preferred_lft 3566sec
    inet6 fe80::814:58ff:fe92:4d6f/64 scope link
       valid_lft forever preferred_lft forever
392: flannel0: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 8973 qdisc pfifo_fast state UNKNOWN qlen 500
    link/none
    inet 10.252.63.0/16 scope global flannel0
       valid_lft forever preferred_lft forever
393: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
    inet 10.252.63.1/24 scope global docker0
       valid_lft forever preferred_lft forever
396: pimreg@NONE: <NOARP,UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN
    link/pimreg
[root@ip-192-168-16-146 ~]# ip route
default via 192.168.16.1 dev eth0
10.252.0.0/16 dev flannel0  proto kernel  scope link  src 10.252.63.0
10.252.63.0/24 dev docker0  proto kernel  scope link  src 10.252.63.1
192.168.16.0/24 dev eth0  proto kernel  scope link  src 192.168.16.146
[root@ip-192-168-16-146 ~]# ip mroute
[root@ip-192-168-16-146 ~]#

However, at the same time, pimd debug messages do not show PIMv2 Hellos received from the other end of the tunnel.

Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Virtual Interface Table ======================================================
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Vif  Local Address    Subnet              Thresh  Flags      Neighbors
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: ---  ---------------  ------------------  ------  ---------  -----------------
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 0  192.168.16.146   192.168.16               1  DISABLED
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 1  10.252.63.0      10.252/16                1  PIM        10.252.63.1
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 2  10.252.63.1      10.252.63/24             1  DR NO-NBR
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 3  10.252.63.0      register_vif0            1
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Multicast Routing Table ======================================================
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: --------------------------------- (*,*,G) ------------------------------------
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Number of Groups: 0
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Number of Cache MIRRORs: 0
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: ------------------------------------------------------------------------------
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Candidate Rendezvous-Point Set ===============================================
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: RP address       Incoming  Group Prefix        Priority  Holdtime
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: ---------------  --------  ------------------  --------  ---------------------
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: ------------------------------------------------------------------------------
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: Current BSR address: 0.0.0.0
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 03:53:11.849 RECV    30 bytes PIM v2 Hello              from 10.252.63.0     to 224.0.0.13
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 03:53:11.849 RECV    30 bytes PIM v2 Hello              from 10.252.63.1     to 224.0.0.13
Jun 08 03:53:11 ip-192-168-16-146.inf.ise.com pimd[13002]: 03:53:11.849 PIM HELLO holdtime from 10.252.63.1 is 105

For some reason the packets does not reach pimd daemon (although they reach flanneld0 interface. Any ideas?

greenpau commented 9 years ago

not sure if this relevant, but the peering interface flannel0 is not in dev_mcast.

# cat /proc/net/dev_mcast
2    eth0            1     0     333300000001
2    eth0            1     0     01005e000001
2    eth0            1     0     3333ff924d6f
2    eth0            1     0     01005e0000fb
4    docker0         1     0     333300000001
4    docker0         1     0     01005e000001
4    docker0         1     0     01005e0000fb
4    docker0         1     0     01005e00000d
4    docker0         1     0     01005e000002
4    docker0         1     0     01005e000016
#

However, it is in the list of VIFs:

# cat /proc/net/ip_mr_vif
Interface      BytesIn  PktsIn  BytesOut PktsOut Flags Local    Remote
 1 docker0           0       0         0       0 00000 013FFC0A 00000000
 2 flannel0          0       0         0       0 00000 003FFC0A 003FFC0A
 3 pimreg            0       0         0       0 00004 013FFC0A 00000000
#
greenpau commented 9 years ago

Interface information:

# ip addr show dev flannel0
11: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 8973 qdisc pfifo_fast state UNKNOWN qlen 500
    link/none
    inet 10.252.63.0/16 scope global flannel0
       valid_lft forever preferred_lft forever
#
greenpau commented 9 years ago

interesting reading about TUN/TAP devices:

https://www.kernel.org/doc/Documentation/networking/tuntap.txt

Perhaps the way flanneld creates its interfaces influences whether an interface is multicast-enabled or not.

greenpau commented 9 years ago

https://github.com/coreos/flannel implements its interface as TUN device with syscall.IFF_TUN | syscall.IFF_NO_PI flags, i.e. layer 3.

https://github.com/coreos/flannel/blob/master/pkg/ip/tun.go#L48 https://github.com/coreos/flannel/blob/master/pkg/ip/tun.go#L55

When starting pimd, I see the following message:

Jun 11 14:50:44 ip-192-168-16-146.inf.ise.com pimd[20052]: 14:50:44.585 Installing flannel0 (10.252.63.0 -> 10.252.63.0) as vif #2-43 - rate 0

The above message is triggered by config_vifs_from_kernel() https://github.com/troglobit/pimd/blob/master/config.c#L294

I see a similar message for docker0 interface:

Installing docker0 (10.252.63.1 on subnet 10.252.63/24) as vif #1-4 - rate 0

However, the difference between the two messages is that the former is triggered only if VIFF_POINT_TO_POINT is set on an interface.

In the code, there is a place where it has the following check:

    /*
     * If the interface is not yet up, set the vifs_down flag to
     * remind us to check again later.
     */
    if (!(flags & IFF_UP)) {
        v->uv_flags |= VIFF_DOWN;
        vifs_down = TRUE;
    }

My guess is that flannel0 interface does not have IFF_UP flag. I will add a log message to check.

greenpau commented 9 years ago

flannel0's IFF_UP flag is set:

Interface flannel0 comes up; vif #2 now in service

docker0 interface is also up:

Interface docker0 comes up; vif #1 now in service
greenpau commented 9 years ago

After "in service", a router:

    if (!(v->uv_flags & VIFF_REGISTER)) {
        /* Join the PIM multicast group on the interface. */
        k_join(pim_socket, allpimrouters_group, v);

        /* Join the ALL-ROUTERS multicast group on the interface.  This
         * allows mtrace requests to loop back if they are run on the
         * multicast router. */
        k_join(igmp_socket, allrouters_group, v);

After adding a log line, I see:

Jun 11 17:36:42 ip-192-168-16-146 pimd[22867]: 17:36:42.523 Interface flannel0 joins PIM multicast group

The k_join() from kern.c is successful because we do not see Cannot join group message.

greenpau commented 9 years ago

I am expecting to see 17:48:10.527 PIM HELLO holdtime from 10.252.93.0, but I do not.

The reason is likely that pim_socket is not getting it.

The socket is initialized by init_pim() in pim.c.

The PIM HELLO message comes from accept_pim in pim.c via receive_pim_hello() in pim_proto.c.

greenpau commented 9 years ago

pim_socket is created by socket(AF_INET, SOCK_RAW, IPPROTO_PIM).

https://github.com/troglobit/pimd/blob/master/pim.c#L73

the socket function takes in the address domain (AF_INET) and the socket type (SOCK_RAW).

If the address domain and socket type of two processes do not match, the processes cannot communicate with each other.

It looks like flannel needs a socket in the same domain and of the same type as pim_socket.

Perhaps ...

greenpau commented 9 years ago

in flanneld, I want to identify PIM packets:

        log_error("received packet for %s ",
                inaddr_str(iph->daddr, daddr, sizeof(daddr)));
        if ( ( ntohl(iph->daddr) & 0xe000000d ) == 0xe000000d ) {
                log_error("(ALL_PIM_ROUTERS) ");
        } else if ( ( ntohl(iph->daddr) & 0xe0000002 ) == 0xe0000002 ) {
                log_error("(ALL_ROUTERS) ");
        } else if  ( ( ntohl(iph->daddr) & 0xe0000001 ) == 0xe0000001 ) {
                log_error("(ALL_HOSTS) ");
        } else {
                log_error("(LOCAL_CTRL) ");
        }
        log_error("from %s\n", inaddr_str(iph->saddr, saddr, sizeof(saddr)));

those matching 0xe000000d are the ones we would want to send to AF_INET, SOCK_RAW, IPPROTO_PIM

greenpau commented 9 years ago

The theory about socket domain and type works:

After adding the following code to flanneld:

        if ( ( ntohl(iph->daddr) & 0xe000000d ) == 0xe000000d ) {
                int pim_socket;
                if ((pim_socket = socket(AF_INET, SOCK_RAW, IPPROTO_PIM)) < 0) {
                        log_error("failed to create PIM socket\n");
                }
                struct sockaddr_in sin;
                memset(&sin, 0, sizeof(sin));
                sin.sin_family = AF_INET;
                if (sendto(pim_socket, buf, buflen, 0, (struct sockaddr *)&sin, sizeof(sin)) < 0) {
                        log_error("failed to send PIM packett\n");
                }
                close(pim_socket);
        } else {
                tun_send_packet(tun, buf, pktlen);
        }

I am seeing the following messages in pimd daemon:

Jun 12 03:40:00 ip-192-168-16-146 pimd[26641]: 03:40:00.418 Ignoring PIM_ASSERT from non-neighbor router 127.0.0.1
Jun 12 03:40:24 ip-192-168-16-146 pimd[26641]: 03:40:24.442 PIM HELLO holdtime from 10.252.63.0 is 105
Jun 12 03:40:30 ip-192-168-16-146 pimd[26641]: 03:40:30.449 Ignoring PIM_ASSERT from non-neighbor router 127.0.0.1
Jun 12 03:40:31 ip-192-168-16-146 pimd[26641]: 03:40:31.388 ignore PIM-DM v2 Graft           from 127.0.0.1 to 127.0.0.1
Jun 12 03:40:54 ip-192-168-16-146 pimd[26641]: 03:40:54.411 PIM HELLO holdtime from 10.252.63.0 is 105
Jun 12 03:41:00 ip-192-168-16-146 pimd[26641]: 03:41:00.289 Ignoring PIM_ASSERT from non-neighbor router 127.0.0.1
Jun 12 03:41:15 ip-192-168-16-146 pimd[26641]: 03:41:15.902 ignore PIM-DM v2 Graft           from 127.0.0.1 to 127.0.0.1

I did not put source address to my sin. Hence, 127.0.0.1 (i.e. 0.0.0.0)

greenpau commented 9 years ago

Did the same thing with IGMP protocol:

Jun 12 05:07:29 ip-192-168-16-146 pimd[29001]: 05:07:29.787 RECV  8993 bytes UNKNOWN IGMP message: type = 0x46, code = 0xc0  from 127.0.0.1     ...27.0.0.1
Jun 12 05:07:29 ip-192-168-16-146 pimd[29001]: 05:07:29.787 Ignoring unknown IGMP message type 46 from 127.0.0.1 to 127.0.0.1
Jun 12 05:07:29 ip-192-168-16-146 pimd[29001]: 05:07:29.787 Ignoring PIM_ASSERT from non-neighbor router 127.0.0.1
Jun 12 05:07:31 ip-192-168-16-146 pimd[29001]: 05:07:31.085 RECV  8993 bytes UNKNOWN IGMP message: type = 0x46, code = 0xc0  from 127.0.0.1     ...27.0.0.1
Jun 12 05:07:31 ip-192-168-16-146 pimd[29001]: 05:07:31.085 Ignoring unknown IGMP message type 46 from 127.0.0.1 to 127.0.0.1
Jun 12 05:07:36 ip-192-168-16-146 pimd[29001]: 05:07:36.493 ignore PIM-DM v2 Graft           from 127.0.0.1 to 127.0.0.1
Jun 12 05:07:37 ip-192-168-16-146 pimd[29001]: 05:07:37.933 RECV  8993 bytes UNKNOWN IGMP message: type = 0x46, code = 0xc0  from 127.0.0.1     ...27.0.0.1
Jun 12 05:07:37 ip-192-168-16-146 pimd[29001]: 05:07:37.933 Ignoring unknown IGMP message type 46 from 127.0.0.1 to 127.0.0.1

UNKNOWN IGMP leads me to believe that the buffer I am passing to PIM/IGMP sockets is invalid and I need to strip ... aahh ... headers? :smiling_imp:

greenpau commented 9 years ago

Now, the remote endpoints (10.252.63.0 to 10.252.93.0) see each other:

Jun 12 15:26:39 ip-192-168-16-147 pimd[2317]: 15:26:39.326 RECV  8993 bytes UNKNOWN IGMP message: type = 0x46, code = 0xc0  fro...2.93.0
Jun 12 15:26:39 ip-192-168-16-147 pimd[2317]: 15:26:39.326 Ignoring unknown IGMP message type 46 from 10.252.63.0 to 10.252.93.0

but they don't like message format that is being passed.

greenpau commented 9 years ago

PIM Register gets to the daemon, but fails with short packet.

Jun 12 17:52:54 ip-192-168-16-147.inf.ise.com pimd[2520]: 17:52:54.838 Received PIM register: len = 8 from 10.252.63.0
Jun 12 17:52:54 ip-192-168-16-147.inf.ise.com pimd[2520]: 17:52:54.838 PIM register: short packet (len = 8) from 10.252.63.0
greenpau commented 9 years ago

It happens because I am passing IGMP payload to PIM receiver. Need to match protocol.

troglobit commented 9 years ago

Glad to see you're working things out for yourself! :smile:

greenpau commented 9 years ago

@troglobit : Glad to see you're working things out for yourself! :smile:

@troglobit, I love it. Especially when things work out well :smile:

Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: Virtual Interface Table ======================================================
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: Vif  Local Address    Subnet              Thresh  Flags      Neighbors
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: ---  ---------------  ------------------  ------  ---------  -----------------
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: 0  192.168.16.146   192.168.16               1  DISABLED
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: 1  10.252.63.1      10.252.63/24             1  DR PIM     10.252.63.0
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: 2  10.252.63.0      10.252/16                1  PIM        10.252.93.0
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: 3  10.252.63.1      register_vif0            1
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: 19:27:35.142 PIM HELLO holdtime from 10.252.63.0 is 105
Jun 13 19:27:35 ip-192-168-16-146 pimd[17606]: 19:27:35.144 PIM HELLO holdtime from 10.252.93.0 is 105

In short, I was able to achieve it neighbor adjacency :smiling_imp:

greenpau commented 9 years ago

https://github.com/greenpau/flannel/commit/669db137af9d7c5ce455816792dece0fcfc52ea7

+                if (sendto(pimd_socket, &buf[iph_len], pimd_len, 0, (struct sockaddr *)&sin, sizeof(sin)) < 0) {
+                        log_error("failed (errno %d) to send data over pimd socket\n", errno);
+                }

made it work! :smile:

troglobit commented 9 years ago

Great stuff! Multicast is the odd piece of the Internet puzzle, guess this was your baptizing? ;-)

greenpau commented 9 years ago

@troglobit : Great stuff! Multicast is the odd piece of the Internet puzzle, guess this was your baptizing? ;-)

@troglobit, I guess I am still in progress. At this point, I am under water :smile:

The adjacency was formed, but multicast is still not flowing :disappointed:

My wife is calling me to help with our :girl: and :girl: ; I will get back to the arena on Monday :dart:

It was fun to map out pimd and flannel daemons and see how the two technologies (tun/tap & multicast) may play together. The flanneld daemon is written in Go and C. In the process, I learned Go :smile: , a few good tricks in C :smile:

troglobit commented 9 years ago

Good luck, and have a nice weekend!

greenpau commented 9 years ago

I am seeing the following messages:

Jun 16 15:47:56 ip-192-168-16-147 pimd[17883]: 15:47:56.401 For src 10.252.63.1, iif is 2, next hop router is 10.252.63.1: NOT A PIM ROUTER
Jun 16 15:48:16 ip-192-168-16-147 pimd[17883]: 15:48:16.924 For src 10.252.63.1, iif is 2, next hop router is 10.252.63.1: NOT A PIM ROUTER

10.252.63.1 is the IP address of a docker bridge on the other size of the tunnel.

That IP address is the RP.

troglobit commented 9 years ago

Is that a static rendez-vous point? If pimd cannot hear "PIM Hello" from a rendez-vous point it won't be able to forward the data. This because pimd forwards the multicast to the RP in a register tunnel.

greenpau commented 9 years ago

@troglobit , diagram was long overdue

screen shot 2015-06-18 at 9 03 43 am

greenpau commented 9 years ago

pi-np01:

# pimd PIM-SM Daemon Configuration File
phyint eth0 disable
phyint docker0 enable
phyint flannel0 enable
rp_address 10.252.63.1 224.0.0.0/4

pi-np02:

# pimd PIM-SM Daemon Configuration File
phyint eth0 disable
phyint docker0 enable
phyint flannel0 enable
rp_address 10.252.63.1 224.0.0.0/4
greenpau commented 9 years ago

@troglobit : Is that a static rendez-vous point?

Yes, it is. It is the IP address of a docker bridge.

greenpau commented 9 years ago

On the docker container I subscribe to a group:

[root@pi-np02-app1 /]# ~/dev/mtools_2.2/mreceive -g 239.1.1.1 -p 5001
Now receiving from multicast group: 239.1.1.1

At that same moment, the host, i.e. pi-np02 has the following entries in pimd log:

Jun 21 03:36:38 ip-192-168-16-147 pimd[692]: 03:36:38.259 RECV    32 bytes IGMP v2 Member Report     from 10.252.93.3     to 239.1.1.1
Jun 21 03:36:40 ip-192-168-16-147 pimd[692]: 03:36:40.261 NETLINK: ask path to 10.252.63.0
Jun 21 03:36:40 ip-192-168-16-147 pimd[692]: 03:36:40.261 NETLINK: vif 2, ifindex=98
Jun 21 03:36:40 ip-192-168-16-147 pimd[692]: 03:36:40.263 NETLINK: ask path to 10.252.63.0
Jun 21 03:36:40 ip-192-168-16-147 pimd[692]: 03:36:40.263 NETLINK: vif 2, ifindex=98

This happened after I change the logic of handling PIM non-multicast packets.

On the sender's side, I see the following:

[root@ip-192-168-16-146 ~]# ip mroute
(10.252.63.3, 239.1.1.1)         Iif: docker0    Oifs: pimreg
[root@ip-192-168-16-146 ~]#
greenpau commented 9 years ago

The device with ifindex=98 is flannel0 interface.

98: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 8973 qdisc pfifo_fast state UNKNOWN qlen 500
    link/none
    inet 10.252.93.0/16 scope global flannel0
       valid_lft forever preferred_lft forever

That seems to be correct!

greenpau commented 9 years ago

verified that docker0 of pi-np02 get Membership Report for 239.1.1.1 group.

13  53.329863000    10.252.93.3 239.1.1.1   IGMPv2  46  Membership Report group 239.1.1.1
greenpau commented 9 years ago

I also see that pimd received that packet:

Jun 22 16:26:07 ip-192-168-16-147 pimd[692]: 16:26:07.159 RECV    32 bytes IGMP v2 Member Report     from 10.252.93.3     to 239.1.1.1
greenpau commented 9 years ago

On pi-np02:

Virtual Interface Table ======================================================
Vif  Local Address    Subnet              Thresh  Flags      Neighbors
---  ---------------  ------------------  ------  ---------  -----------------
  0  192.168.16.147   192.168.16               1  DISABLED
  1  10.252.93.1      10.252.93/24             1  DR PIM     10.252.93.0
  2  10.252.93.0      10.252/16                1  DR PIM     10.252.63.0
  3  10.252.93.1      register_vif0            1

Multicast Routing Table ======================================================
----------------------------------- (*,G) ------------------------------------
Source           Group            RP Address       Flags
---------------  ---------------  ---------------  ---------------------------
INADDR_ANY       239.1.1.1        10.252.63.0      WC RP
Joined   oifs: ....
Pruned   oifs: ....
Leaves   oifs: .l..
Asserted oifs: ....
Outgoing oifs: .o..
Incoming     : ..I.

TIMERS:  Entry    JP    RS  Assert VIFS:  0  1  2  3
             0    22     0       0        0  0  0  0
----------------------------------- (S,G) ------------------------------------
--------------------------------- (*,*,G) ------------------------------------
Number of Groups: 1
Number of Cache MIRRORs: 0
------------------------------------------------------------------------------

Per RFC 2362, the WC-bit indicates that any source may match and be forwarded according to this entry if there is no longer match. It is a wildcard entry; if there is no more specific match for a particular source, packets will be forwarded according to this entry.

greenpau commented 9 years ago

Q: Why the router, pi-np02 created the *(,G)** entry?

Because pi-np02-app2 (10.252.93.3) is the first host to join the multicast group, pi-np02 had to create that entry. Then, the pi-np02 should place the interface where it received the packet (docker0 interface) into the outgoing interface list of the *(,G)**.

:warning: this did not happen.

Then, pi-np02 must also send a PIM *(,G)** Join toward RP (10.252.63.0) to join the shared tree. The router should use unicast routing table to determine the interface toward the RP.

:warning: this did not happen.

:disappointed:

troglobit commented 9 years ago

Hmm, the RP for that group is listed as 10.252.63.0 ... but .0 is the network address, not a an actual router's address? Maybe some mismatch in CIDR mask calculation? (Guessing now)

troglobit commented 9 years ago

I've been reading up some more and found this old gem I'd completely forgotten about. It mentions disabling the Linux rp_filter setting. Maybe could be something to look into?

greenpau commented 9 years ago

@troglobit , thank you for looking into it. I need to revisit this project in a few week... being a father to 2 month old is a tough job :smile:

troglobit commented 9 years ago

@greenpau Oh, I guess congratulations are in order then! :smiley:

greenpau commented 9 years ago

@troglobit , I suspect the issue with NOT A PIM ROUTER is caused by PIM REGISTER packets either not reaching RP router or not processed correctly when received from a socket.

troglobit commented 9 years ago

Hmm, no I think that originates from when pimd tries to setup a multicast route, but fails since it cannot find any record of having received a PIM Hello from a router with that src IP.

You should be able to see the PIM traffic (both HELLO's and any REGISTER messages) on the flannel0 interface ... if I understand your drawing.

troglobit commented 9 years ago

Ever so slowly I'm beginning to understand your setup. The error message from pimd, that 10.252.63.1 is not a PIM router is clearly an indication that something's a foot!

Reading back in this thread with rested eyes, I see you mentioning that 10.252.63.1 is the IP address of a docker bridge? But a Rendezvous-Point must be a PIM router, not a bridge! I'm assuming here you only have the two PIM routers, right? Then all you need is for IGMP to work on the left and side (for layer-2 multicast) and PIM to work on the link between the PIM routers.

I tried figuring out the IP address of the two PIM routers: pi-np01 and pi-np02, to no avail, so to me it looks like a configuration error at the moment. _Try removing the rp_address statement from your pimd.conffiles on both PIM routers._

When no static RP address is advertised PIM figures out the RP by itself. The static RP setting is mostly for people who want to fine tune their networks.

(When the RP for a multicast group has been agreed upon, a PIM router will forward all multicast for that group to that RP (distribution point) in a so called register tunnel. Basically the multicast data is encapsulated in a PIM header and directed towards the RP. A PIM join is then required to receive the un-encapsulated data from the RP.)

If you want to set up static RP's for your network, I'd recommend first figuring out what multicast groups are available on the left and right networks. The left-hand side multicast should be distributed by pi-np01 and the right-hand side multicast by pi-np02, but if the multicast groups are not allocated properly you may have overlap -- in that case there is probably no real win for you to set up static RP's. The whole point is to protect your shared link so it isn't unnecessarily flooded with multicast.

So, remove the rp_address setting completly, that should do the trick! (Unless I've misread something else :)

troglobit commented 9 years ago

Also, when playing around with multicast in my own test bed using virt-manager and KVM machines, I've seen that the Linux bridge really does not like pimd taking over as IGMP "master" on a LAN. So I have to disable multicast_snooping on the bridge in my host for it to properly forward the multicast.

I've written a small HowTo on this http://troglobit.com/multicast-howto.html but the main point is this:

host# echo 0 > /sys/devices/virtual/net/virbr1/bridge/multicast_snooping

Clearly sub-optimal for the layer-2 network since all multicast will be forwarded like broadcast, but I've yet to find another way around it. Hopefully this can be another help in getting the multicast data from your *-appN to the pi-np0M PIM routers.

greenpau commented 9 years ago

@troglobit , Ever so slowly I'm beginning to understand your setup.

haha :+1: thank you for looking into it. I did not get a chance to go back and revisit what was done.

Reading back in this thread with rested eyes, I see you mentioning that 10.252.63.1 is the IP address of a docker bridge?

Corrected. See below.

I'm assuming here you only have the two PIM routers, right?

Correct.

Then all you need is for IGMP to work on the left and side (for layer-2 multicast) and PIM to work on the link between the PIM routers.

IGMP was working fine.

I tried figuring out the IP address of the two PIM routers: pi-np01 and pi-np02, to no avail, so to me it looks like a configuration error at the moment.

pi-np01 is 192-168-16-146 and pi-np02 is 192-168-16-147.

Try removing the rp_address statement from your pimd.conffiles on both PIM routers.

will do and report back.

greenpau commented 9 years ago

@troglobit: I've written a small HowTo on this http://troglobit.com/multicast-howto.html but the main point is this:

great read!

greenpau commented 9 years ago

@troglobit : 10.252.63.1 is the IP address of a docker bridge?

this is actually flanneld interface, which it TUN interface.

rburkholder commented 8 years ago

curious: was some sort of resolution reached?

greenpau commented 8 years ago

@rburkholder, it worked. multicast was received :-)