troglobit / pimd

PIM-SM/SSM multicast routing for UNIX and Linux
http://troglobit.com/projects/pimd/
BSD 3-Clause "New" or "Revised" License
194 stars 86 forks source link

10 seconds delay to start forwarding multicast traffic #192

Closed Borjand closed 2 years ago

Borjand commented 3 years ago

Hi folks,

scenario-issue-pimd

I am doing some experiments with the scenario deployment included in the capture above to validate if it is possible to integrate multicast traffic into a network service through the use of pimd. In particular, this scenario includes two Ubuntu 16.04.5 machines that play the router role with the pimd version 2.3.1 installed, and which enable the layer 3 communications between two additional machines that we refer to as Server and Client.

With this, we have managed to perform a UDP traffic flow transmission using the multicast IP address 239.0.0.4 from the Server to the Client. The only drawback we have detected, is that since the Client reports that it wants to receive this traffic flow, the router on the server side takes about 10 seconds to start forwarding traffic (It should be noted that the server was already transmitting before the client requested the traffic reception). Analyzing what could be the cause of this ten second delay, we launched pimd in debug mode and stored the output in a log file. As can be seen in the screenshot belonging to router 1, from the time the JOIN/PRUNE is received from router 2 (at time 20:07:10), which was triggered from the client notification based on IGMP to notify that it wants to receive the traffic flow, until router 1 adds the MFC entry of the kernel (at time 20:07:20), 10 seconds pass. After this, the traffic is seamlessly received on the client.

Could this behavior be due to a bug, or to some erroneous configuration when executing pimd?

Thanks in advance!

router-1-pimd-log

troglobit commented 3 years ago

Wow, only 10 sec delay!

There is likely many reasons, but one of them lies in the nature of the PIM-SM protocol, which unlike dense mode protocol's that have a flood-and-prune strategy, need to 1) establish the rendez-vous point per group(s), 2) listen to IGMP join, 3) perform PIM Join towards rendez-vous point, before starting to forward the traffic.

You could try the latest git master of the upcoming v3.0. It has many bug fixes and performance improvements that may be interesting to you.

Borjand commented 3 years ago

Thanks for the prompt reply Joachim!

The procedure you mention is the first thing we analyzed to check if it was responsible for those ten seconds delay. Although it is not visible in the previous screenshot, the IGMP report sent by the client to subscribe to the multicast group (239.0.0.4) is one second before the first Join/Prune shown in the screenshot (i.e. at time 20:07:09). From there, you can see in the screenshot of my previous post that pimd starts the appropriate process to create the routing table and that the RP is designated to router 1. This seems to take a couple of seconds, but after that, there is a delay that we do not know what can produce it. This delay finalizes when pimd reports that "Added kernel MFC entry", and it is from here that the traffic flow starts to be forwarded. Any other clues that might lead us to discover the reason for this delay?

On the other hand, thank you very much for the suggestion. We are going to install this new version and continue with our experiments. In this sense, if we verify that this delay is reduced with the new version, we will report it in this thread.

troglobit commented 2 years ago

Sorry for the late reply (this time). As you've probably realized, pimd isn't my top-10 project, in part because $DAYJOB doesn't sponsor it. So it takes me time sometimes to circle back to it.

I'm not sure you're there anymore, @Borjand, but for posterity at least, I've added a string of tests for the autobuilder (now using GitHub Actions, finally). The test is called two.sh and is intended to verify this test case. Currently there is not verification of time between join and actual traffic, but this can at least be inspected from the logs.

Also, I've added the 3.0 milestone to this, and will close it when that is finally released.

troglobit commented 2 years ago

Here's a sample run: https://github.com/troglobit/pimd/actions/runs/1255356383 tests are in the new test/ subdirectory.

troglobit commented 2 years ago

Closing, preparing for v3.0 release.