troglobit / pimd

PIM-SM/SSM multicast routing for UNIX and Linux
http://troglobit.com/projects/pimd/
BSD 3-Clause "New" or "Revised" License
194 stars 86 forks source link

Fix an issue on RP hash calculations. #93

Closed xdxu closed 7 years ago

xdxu commented 7 years ago

As per RFC4601, all multicast routers in a domain should use the same algorithm for RP elections, and all the IP addresses including the mask address used in the algorithm should be in host byte order. The change fixes the problem about RP hash calculation so that the results will be consistent across different routers.

Signed-off-by: Xiaodong Xu stid.smth@gmail.com

troglobit commented 7 years ago

Interesting! This looks like it could very well be the root cause to issue #81

The code also looks good. Only real problem I see, apart from introducing an incompatibility with earlier versions of pimd, is that I cannot find where in section 4.7.2 of RFC4601 it states the address/mask/group should be in host byte order? Sure, the code seems to suggest it by the use of _h suffix to the local variables, but I cannot find passage in the text to corroborate this.

If you could help me shed some light on this detail I'd be most grateful, because I have to write something in the ChangeLog to explain the possible incompatibility with older pimd versions we introduce.

xdxu commented 7 years ago

It doesn't explicitly state that the address/mask has to be in host byte order. But if you look at the following description:

Value(G,M,C(i))= (1103515245 ((1103515245 (G&M)+12345) XOR C(i)) + 12345) mod 2^31

where C(i) is the RP address and M is a hash-mask. If BSR is being used, the hash-mask is given in the Bootstrap messages. If BSR is not being used, the alternative mechanism that supplies the group-range-to-RP mappings may supply the value, or else it defaults to a mask with the most significant 30 bits being one for IPv4 and the most significant 126 bits being one for IPv6. The hash-mask allows a small number of consecutive groups (e.g., 4) to always hash to the same RP.

only if the hash mask is in host byte order, can the algorithm hash to the same result for the 4 consecutive group IP addresses within the same subnet of /30. And yes it is going to break the compatibility with older pimd version. But it will be compatible with Cisco PIM-SM :)

troglobit commented 7 years ago

Ah, thanks for clearing that up! Wasn't obvious to me at all, although I did read the description several times.

Hmm, maybe we should add a command line compatibility switch, or configure option to support users with mixed installations, or even bump the major version (2.4.0 :point_right: 3.0) to denote incompatible change? :cold_sweat:

Good to know this change at least makes it compatible with Cisco! :v:

xdxu commented 7 years ago

I agree we need bump the major version and document this new behavior in the release notes. However, I kinda feel that it is a BUG rather than a behavior change. I think you may notice two issues in the hash function, which actually cause unexpected hash value and the result could be unpredictable from the user's perspective. Based on this fact, I would prefer not adding a configuration option for backward compatibility. So we may require the user to upgrade pimd deployed on all the routers in the same domain. Does it make sense?

troglobit commented 7 years ago

Agreed, let's bump major to 3.0 and document in ChangeLog and also recommend users to upgrade all routers in the same domain. It's both reasonable and the only sane option I think. Thanks for the rubber ducking!

Yeah, I noticed the ordering bug with curr_address_h as well, good catch!

OK. I'll start prepping the repo and all meta data for 3.0 and then I'll merge this.

troglobit commented 7 years ago

(I've been thinking about changing command line options and possibly also some .conf settings, so upping to 3.0 is really not a bad idea.)

troglobit commented 7 years ago

@xdxu Thank you again for your contribution! :smiley: :+1: