troglobit / pimd

PIM-SM/SSM multicast routing for UNIX and Linux
http://troglobit.com/projects/pimd/
BSD 3-Clause "New" or "Revised" License
194 stars 86 forks source link

PIMD working, never the less some doubts e.g. because lots(!) of alarms related to e.g. 192.254.0.1 #236

Open LouisAtGH opened 1 year ago

LouisAtGH commented 1 year ago

Joachim,

pimd is working as expected, however despite that there are two thing to notice:

I did attache my pimd config file and a file with some data I did collect trying to understand these issues. It contains pimctl output and a wireshark trace.

Sincerely,

Louis

I notice the following alarms

<28>1 2022-11-17T20:54:13.978780+01:00 pfSense.lan pimd 61739 - - Timeout waiting for reply from routing socket for 169.254.0.1 <28>1 2022-11-17T20:54:54.935547+01:00 pfSense.lan pimd 61739 - - Timeout waiting for reply from routing socket for 169.254.0.1 <28>1 2022-11-17T20:55:22.174577+01:00 pfSense.lan pimd 61739 - - Timeout waiting for reply from routing socket for 10.60.142.62 <28>1 2022-11-17T20:56:14.808738+01:00 pfSense.lan pimd 61739 - - Timeout waiting for reply from routing socket for 169.254.0.1 <28>1 2022-11-17T20:56:54.159538+01:00 pfSense.lan pimd 61739 - - Timeout waiting for reply from routing socket for 169.254.0.1 <28>1 2022-11-17T20:57:14.637808+01:00 pfSense.lan pimd 61739 - - Timeout waiting for reply from routing socket for 169.254.0.1 Strange is that that "169.254..x.y" is of course a special range and why o why is pimd using that range !? The other alarm is also strange since the 10.60.142.62 is perhaps a range used by my provide, but not mine **Strange!? All interfaces show in the pimd interface table** Probably OK, but not sure is the shown PIM interface table, the interface table shows nearly all interfaces, where I did define all interfaces of except (Default bind bind to none) **Used interfaces** lagg0.16 Up 192.168.1.1 1 30 0 192.168.1.1 (Normal PC-LAN that is where the hifi receivers are) lagg0.26 Up 192.168.2.1 1 30 0 192.168.2.1 1 (PC-zone-2) lagg1.11 Up 192.168.11.1 1 30 0 192.168.11.1 1 (PC-zone-3) lagg0.13 Up 192.168.13.1 1 30 0 192.168.13.1 1 (IOT zone not yet used) lagg1.14 Up 192.168.14.1 1 30 0 192.168.14.1 (redzone that is where the media player is (TWONKY) em0.4 Up 10.236.170.200 1 30 0 10.236.170.200 1 (IP-TV not used, not enabled) The media player is (TWONKY) is situated in the redzone on ip 192.168.14.15 [pimd.conf.txt](https://github.com/troglobit/pimd/files/10047849/pimd.conf.txt) [pimd_issues_maybe.txt](https://github.com/troglobit/pimd/files/10047852/pimd_issues_maybe.txt)
troglobit commented 1 year ago

I'll try to respond. I'd very much appreciate, however, if you could post one problem per issue in GitHub, and keep it as concise as possible.

  1. What (git) version of pimd is used?
  2. How is pimd started?
    • In particular the --disable-vifs command line flag is key to answer the second question -- the enable and disable keywords in pimd.conf change behavior slightly depending on that that option.
    • The amount of logs, and type of logs, is also highly dependent on the command line option. There is log level and susbystem that decides what to log.

Regarding the 169.254 issue you bring up, lots of old UNIX daemons, and in particular the mrouted+pimd family, often denote the multicast VIF or base interface using the first IP address it found when scanning for interfaces. The routing socket backend of pimd could definitely use some help here to be cleaned up and made to follow the log syntax used by the Linux netlink backend. I hope that answers that question.

Regarding the question of "Strange!? All interfaces show up in the pimd interface table", this too is related to the second point addressed above, command line options.

LouisAtGH commented 1 year ago

Joachim,

ad 1) I build it a couple of days ago from the latest master branch ad 2) pimd is not started using any option as far as I can see, it is just using the config file

The alarms do occur in the pfSense system.log file

<28>1 2022-11-21T13:02:17.036665+01:00 pfSense.lan pimd 67308 - - Timeout waiting for reply from routing socket for 169.254.0.1 <28>1 2022-11-21T13:02:36.786660+01:00 pfSense.lan pimd 67308 - - Timeout waiting for reply from routing socket for 169.254.0.1 <28>1 2022-11-21T13:03:16.602660+01:00 pfSense.lan pimd 67308 - - Timeout waiting for reply from routing socket for 169.254.0.1 Thousands !!!! I really must get rid of those ! Related to the second point, I am just wondering about that, I did explicit select ^default do not include^ and never the less they occur in that output. - I am not sure if that is correct - I am not sure if pimd is processing those ^not selected^ interfaces correct. That is why I did attach extra info If I should have made two issues one for the '169.254 issue' and one for the 'vague interface selection behavoir' my excuses
troglobit commented 1 year ago

The Timeout log message seems to come from 3e7fb03, introduced by @stormshield-damiend in 2021. It's possible it should be a LOG_DEBUG level log message instead. As the code is constructed it looks like it was set to LOG_WARNING during development, only to verify the refactor worked as intended. Since I don't have and BSD system up and running it'll take me a while to verify this theory.

I'll have to look in to the issue of enable/disable of interfaces separately. Hopefully I don't need a FreeBSD system for that. Let me get back to you on that.

LouisAtGH commented 1 year ago

Joachim, I have an FreeBSD14-current here, because I did need that to compile your code for pfSense ....

Also note that my feeling is that not only the alarm is an issue but also all the real messages on the network to which they refer. I mean what is the use of sending an endless number of messages to an "IPV4 'link local" like "169.254..x.y"

troglobit commented 1 year ago

Yes, if the message should be kept (maybe @stormshield-damiend can weigh in on that), it should probably not refer to the interface by IP address, but maybe by its name instead. I don't know, I'm not that familiar with the BSD routing sockets, frankly.

LouisAtGH commented 1 year ago

Joachim, I found a problem with the (automatic created) config file. When starting which a modified file:

I will of course take care of the config file configuration issues myself. Sorry that I did not detected that issue earlier

troglobit commented 1 year ago

Great to hear! No problem :hand::smiley:

Let's leave this issue open to give @stormshield-damiend some time to answer our question about log level above.

LouisAtGH commented 1 year ago

Yep, it should stay open, the 169.254.x.y is still there "pimd[44512]: Timeout waiting for reply from routing socket for 169.254.0.1" which is to my feeling (not an high level expert on this) complete nonsense, this 169.254.x.y. is by no means a normal range. Let me at for @stormshield-damiend that at this moment I am running pimd on the pfSense development release based on FreebSD14-current (which works fine, as far as I know at the moment, apart from this issue)

LouisAtGH commented 1 year ago

@stormshield-damiend Assuming there is no principle issue behind this issue, It is probably not a big thing to fix. I would appriciate

stormshield-damiend commented 1 year ago

Hi,

i do not work anymore on pimd, dev in our side is now made by other people.

What i can say is that print message is an error message that should only occur when you fail to get a route for an unknown addr or when the routing part of the kernel fail to answer for whatever reason. About the flooding part, maybe this log could be protected by some kind of time protection to prevent it from happening too often (once per second maybe ?). What could also be done is raising the log type to prevent printing it in default and maybe add a debug counter that could be printed elsewhere like for example "# amount of request blocked due to kernel timeout".