troglobit / pimd

PIM-SM/SSM multicast routing for UNIX and Linux
http://troglobit.com/projects/pimd/
BSD 3-Clause "New" or "Revised" License
194 stars 86 forks source link

Pimd as static RP router in PIM-SM #166

Closed phoenix9047 closed 2 years ago

phoenix9047 commented 3 years ago

Hi,

My name is Leonid, below I'll try to describe the environment I'm trying to create, configuration, the issue I'm experiencing and my investigation.

Topology

The idea is to build a simple topology with three routers and PIM-SM (Sparse mode) protocol with one static RP router in the middle - R2. I wanted pimd to manage multicast on each of the routers. Originally I was going to build this topology with Alpine containers inside Docker, but then I found that even with Ubuntu 18.04 VMs in VMWare environment - multicast does not reach the receiver. At the same time, it's important to notice, that if I replace only R2 in this topology with a Cisco router, multicast starts working - confirmed just now additionally. So the issue might be in a functionality of static RP.

Sender uses following iperf syntax to generate multicast traffic:

# iperf -c 224.0.2.6 -u -T 32 -t 600 -i 1 -b 10k -l 200

Receiver uses the following syntax to listen for the same multicast group:

# iperf -s -u -B 224.0.2.6 -i 1

Three Ubuntu-routers R1,R2 and R4 run Quagga daemons Zebra and OSPFD in order to establish OSPF neighborships and share routes, so ping between sender and receiver - works, and traceroute shows correct hops in the middle. pimd configurations:

= R1:

rp-address 10.12.12.2 224.0.0.0/4
phyint 172.16.0.254 enable igmpv3
spt-threshold packets 0 interval 0

= R2:

rp-address 10.12.12.2 224.0.0.0/4
spt-threshold packets 0 interval 0

= R3:

rp-address 10.12.12.2 224.0.0.0/4
phyint 192.168.4.254 enable igmpv3
spt-threshold packets 0 interval 0

I checked the steps as per the 'Debug Help' section in wiki here on Github - all good.

I collected tcpdump on all three routers and output of the process in debug mode, using syntax: /usr/sbin/pimd --foreground --debug=all, and here is what I see, when R2 is Ubuntu:

  1. R1 receives a multicast traffic and sends a register message to R2, encapsulated into a single packet 172.16.0.254 > 10.12.12.2.
  2. R2 receives this packet, here is the debug from it:
21:17:02.211 RECV   256 bytes PIM v2 Register           from 172.16.0.254    to 10.12.12.2 
21:17:02.211 Received PIM register: len = 236 from 172.16.0.254
21:17:02.211 find_route: Group not found. Return the (*,*,RP) entry
21:17:02.211 No routing entry for source 172.16.0.1 and/or group 224.0.2.6
21:17:02.211 Send PIM REGISTER STOP from 10.12.12.2 to router 172.16.0.254 for src = 172.16.0.1 and group = 224.0.2.6
21:17:02.211 SEND    38 bytes PIM v2 Register_Stop      from 10.12.12.2      to 172.16.0.254 ...

So R2 just sends Register-stop message right away. I collected the same debugs on R1 and R4, when R2 is a cisco router, and here is the flow:

  1. R1 sends Register messages with encapsulated multicast,
  2. R2 Sends back a PIM Join/Prune message.
  3. R1, when received a Join message, starts forwarding a plain multicast to R2 without encapsulation, alongside to the encapsulated traffic
  4. R2 sends back a Register-stop
  5. R1 stops forwarding encapsulated traffic. Which is the expected flow.

So the problematic point when R2 is PIMD is that it does not send back a Join/Prune - but Register-stop instead.

==========================

I'm not really good with C, just tried to understand the flow from this debug. Debug message "Received PIM register" is processed by method receive_pim_register in pim_proto.c. First verifications in this method pass, it gets inner_src and inner_grp in lines 672-673. And here in line 694 we enter the method find_route from mrt.c, with contant 'DONT_CREATE' as argument. And from this point I'm not quite understanding, how it comes we don't reach a method issuing a Join message. But in general, I'm wondering, why we use the 'DONT_CREATE' constant here, if R2 (from my PIM-SM understanding) should exactly build a tree to the origin 172.16.0.1 and thus create this route based on unicast routing table, sort of, no?

Could you please help me with this issue?

Thank you in advance, Leonid

phoenix9047 commented 3 years ago

I don't know what's wrong with my setup.. I disabled static rp configuration line 'rp-address ...' on all routers, and added following lines to /etc/pimd.conf on R2 only: bsr-candidate priority 5 interval 5 rp-candidate priority 5 interval 5 And restarted the daemons on all three routers - and the situation looks the same. Sometimes on R2 I see the line in output of 'ip mroute': (172.16.0.1, 224.0.2.6) Iif: pimreg Oifs: ens192 State: resolved and in tcpdump R2 receives Register messages (encapsulated multicast traffic), but does not forward it and that's it, receiver does not get the traffic, because even R4 does not get the traffic...

It's probable that just my setup has some incorrect configuration, but I'm already for long time trying to figure out what's wrong, and hasn't been able ever to make an Ubuntu with pimd work as an RP.

best wishes, Leonid..

phoenix9047 commented 3 years ago

I tried setting rp_filter value on both interfaces on all 3 routers to 0 value using sysctl - no effect. pimd version is 2.3.2

eugk123 commented 3 years ago

Hey man,

It's been a while since i worked with multicasting. From what I remember, I switched to FRR (FRRouting) and was able to solve all my multicast issues!

troglobit commented 3 years ago

@phoenix9047 Sorry for the late reply, pimd v2.3.2 is really old. There are fixes on master and a v3 planned since years back. I've not had time to finalize it yet, and nobody else has stepped up to help with maintenance tasks (like testing on different operating systems, writing changelogs, etc.) and finalizing the pimctl work.

You're welcome to try and help out with the master branch. During the next coming months I hope to spend some of my evenings and weekends working on pimd (and it's cousin pimd-dense), but since I'm alone in this, I cannot make any promises. Sorry.

If you want something working soonish you're probably best off trying FRR, like @eugk123 said. They have a huge team of paid developers working on developing their PIM implementation.

I still see a value of having a stand-alone PIM, but probably not for regular users expecting (free) support.

phoenix9047 commented 3 years ago

@troglobit thank you for the reply! First, I'd like to say that you did an incredible and amazing effort writing this daemon. And I'm really shocked especially seeing that you do it alone, because I installed your pimd using 'apt-get install pimd' command on Ubuntu, and I was sure this pimd is an official Ubuntu daemon for PIM, written and tested by a lot of people. So absolutely did not expect one single person working on it his free evenings, for years. I hope you're a well-paid C developer with this experience.

I indeed do need it work asap, and have already tried today to fulfill my task with FRR under Ubuntu and Docker.

In fact, standalone pimd is a good idea. Though (mostly because I'm clearly nothing in C) I cannot imagine writing this all alone, especially when FRR have it written and working, and their code is available. But if you're going to continue with this project, I can help some with testing the daemon operation on several OSes and compare to PIM handling on a Cisco router. If it'd help, feel free to contact me on phoenix9047@gmail.com.

Good luck!

best wishes, Leonid

troglobit commented 3 years ago

@phoenix9047 This is the original implementation, not by me. I'm just the last maintainer :)

Would be awesome to get someone to test against Cisco! I'll bookmark this issue and get back to you later.

troglobit commented 3 years ago

Hi again, just to let you know: I got up early this Saturday morning (CET) and was curious about your setup. I tried it out in Core with the latest pimd sources, and it works :)

core-ospf-pimd

So maybe I should try to finalize the v3 release then ... :grinning:

(Btw: in IGMP (which is used on the edge networks towards clients) the lowest IP wins the querier election. Thus, having the router as the highest IP address might not be the best idea. Many IGMP snooping capable devices (switches) on the network can act as querier, so any malfunction in such equipment can cause problems.)

troglobit commented 2 years ago

Reopening for upcoming v3.0, might be useful to others to see what issues are solved with the new version.