Closed LouisAtGH closed 3 years ago
I did select some more alarm ^types^ from the log (Below).
Note that:
Here the selected syslog messages
Nov 5 08:23:26 pfSense pimd[23546]: Failed adding MFC entry src 192.168.116.33 grp 239.255.255.250 from ix1.116 to register_vif0: Invalid argument
Nov 5 08:23:26 pfSense pimd[23546]: Failed adding MFC entry src 192.168.116.33 grp 239.255.255.250 from ix1.116 to lagg0.26, register_vif0: Invalid argument
Nov 5 08:23:29 pfSense pimd[23546]: Failed adding MFC entry src 192.168.116.33 grp 239.255.255.250 from ix1.116 to lagg0.26, lagg0.16, ix0.14, register_vif0: Invalid argument
Nov 5 08:23:31 pfSense pimd[23546]: pimd version 3.0-beta1 exiting.
Nov 5 08:23:31 pfSense pimd[23546]: Failed removing MFC entry src 192.168.1.2, grp 239.255.255.250: Invalid argument
Nov 5 08:23:31 pfSense pimd[23546]: Failed removing MFC entry src 192.168.1.3, grp 239.255.255.250: Invalid argument
Nov 5 08:23:31 pfSense pimd[23546]: Failed removing MFC entry src 192.168.116.33, grp 239.255.255.250: Invalid argument
Nov 5 08:23:51 pfSense pimd[35519]: Failed adding MFC entry src 192.168.1.3 grp 239.255.255.250 from lagg0.16 to lagg0.26, ix0.14, ix1.116, register_vif0: Invalid argument
Nov 5 08:23:49 pfSense pimd[35519]: Failed adding MFC entry src 192.168.1.2 grp 239.255.255.250 from lagg0.16 to lagg0.26, ix0.14, ix1.116, register_vif0: Invalid argument
Nov 5 08:25:15 pfSense pimd[35519]: Failed reading kernel count (SIOCGETSGCNT) for (S,G) on (192.168.1.2, 239.255.255.250): Can't assign requested address
Nov 5 08:25:15 pfSense pimd[35519]: Failed reading kernel count (SIOCGETSGCNT) for (S,G) on (192.168.1.36, 239.255.255.250): Can't assign requested address
Nov 5 08:27:10 pfSense pimd[35519]: Failed removing MFC entry src 192.168.116.33, grp 239.255.255.250: Invalid argument
Nov 5 08:33:05 pfSense pimd[35519]: Failed adding MFC entry src 192.168.14.15 grp 239.255.255.250 from ix0.14 to lagg0.26, lagg0.16, ix1.116, register_vif0: Invalid argument
Have really no idea what's going on here. Maybe pimd starts too soon before interfaces are up properly? I noticed that pfSense root[38078]: Bootup complete
is printed after a set of pimd log entries, possibly an ordering problem in pfSense. In mrouted
we added a command line options for cases like this; -w, --startup-delay=SEC Startup delay before forwarding
, which prevents the daemon from attempting to create VIFs before a certain (system/site specific) delay has passed. I've been meaning to add something like that to pimd
as well, but haven't gotten around to it yet.
Could you please share your pimd.conf
(if you have one) and the output from ifconfig
when pimd starts. I'd like to rule out problems with configuration or system first.
There seems to be a path issue:
I did copy the correct pimd.conf to /usr/local/etc and started pimd again. Does not seem to solve the problem
Have to think about this.
I did not change the interface configuration or the pimd.conf for month. But of course
Updating pfSense release from a couple of weeks back to the actual version, did provoke this problem. Not saying it caused this problem.
I will rebuild the pimd related pfSense package just to be sure (I do not expect that to help, but just to be sure) Update: As expected that did not change/solve anything Update: PIMD has been running all day ("working" however generating errors all the time). and not working properly I think
Here the requested files.
Louis 20201106 pimd.conf.txt 20201106 ifconfig.txt
Adding part of syslog
For info actual pimctl "default" report. Not sure the multicast routes are really there
PIMD Routes
VIF Local Address Subnet Thresh Flags Neighbors
===============================================================================
0 192.168.2.1 192.168.2 1 DR NO-NBR
1 192.168.13.1 192.168.13 1 DR NO-NBR
2 192.168.1.1 192.168.1 1 DR NO-NBR
3 192.168.14.1 192.168.14 1 DR NO-NBR
4 192.168.116.1 192.168.116 1 DR NO-NBR
5 192.168.2.1 register_vif0 1
Source Group RP Address Flags (*,G)
===============================================================================
* 239.255.255.250 192.168.14.15 WC RP
Joined oifs: ......
Pruned oifs: ......
Leaves oifs: l.lll.
Asserted oifs: ......
Outgoing oifs: o.ooo.
Incoming : ...I..
TIMERS: Entry JP RS Assert VIFS: 0 1 2 3 4 5
0 35 0 0 0 0 0 0 0 0
Source Group RP Address Flags (S,G)
===============================================================================
192.168.1.3 239.255.255.250 192.168.14.15 SPT CACHE SG
Joined oifs: .....j
Pruned oifs: ......
Leaves oifs: l.lll.
Asserted oifs: ......
Outgoing oifs: o..ooo
Incoming : ..I...
TIMERS: Entry JP RS Assert VIFS: 0 1 2 3 4 5
185 60 0 0 0 0 0 0 0 0
Source Group RP Address Flags (S,G)
===============================================================================
192.168.1.36 239.255.255.250 192.168.14.15 SPT CACHE SG
Joined oifs: .....j
Pruned oifs: ......
Leaves oifs: l.lll.
Asserted oifs: ......
Outgoing oifs: o..ooo
Incoming : ..I...
TIMERS: Entry JP RS Assert VIFS: 0 1 2 3 4 5
170 25 0 0 0 0 0 0 0 0
Source Group RP Address Flags (S,G)
===============================================================================
192.168.1.42 239.255.255.250 192.168.14.15 SPT CACHE SG
Joined oifs: .....j
Pruned oifs: ......
Leaves oifs: l.lll.
Asserted oifs: ......
Outgoing oifs: o..ooo
Incoming : ..I...
TIMERS: Entry JP RS Assert VIFS: 0 1 2 3 4 5
185 45 0 0 0 0 0 0 0 0
Source Group RP Address Flags (S,G)
===============================================================================
192.168.14.15 239.255.255.250 192.168.14.15 CACHE SG
Joined oifs: .....j
Pruned oifs: ......
Leaves oifs: l.lll.
Asserted oifs: ......
Outgoing oifs: o.o.oo
Incoming : ...I..
TIMERS: Entry JP RS Assert VIFS: 0 1 2 3 4 5
185 50 0 0 0 0 0 0 0 0
Number of Groups: 1
Number of Cache MIRRORs: 4
RP address Incoming Group Prefix Priority Holdtime
===============================================================================
192.168.14.15 3 Static
239/8 1 Forever
169.254.0.1 32 Static
232/8 1 Forever
Current BSR address: 0.0.0.0
Perhaps I am terribly wrong, but I noticed that all alarms are refering to vif0, and vif0 is special I think, but the involved devices are not related to vif0 at all .....
As said perhaps very stupid remark ....
Louis
No clue really what's going on. I'll have to see if I can set up something similar to see if I can reproduce. Just for information: the vif0 you mention is actually register_vif0
, which is an interface pimd
creates. It is used for the register tunnel towards the Rendez-vous Point (RP), so something fails when pimd
tries to create routing entries to forward multicast to the RP.
If help full I can run a pimd instance with extra debug lines added ....
Thanks, maybe later. Meanwhile, could you verify a theory for me? I suspect at least parts of the log messages you see are bogus due to the incoming multicast traffic having TTL=1. (No routing entry is actually set in kernel, and when trying to remove it we get these bogus messages trying to remove something that isn't there). You can check the TTL of the 239.255.255.250 group, on the incoming interface, using tcpdump with the -vvv
flag, e.g.,
tcpdump -lni ix0.14 -vvv
The 239.255.255.250 group is used by Windows machines for their Simple Service Discovery Protocol (SSDP), which is a link-local (only) protocol similar to (but worse than) mDNS.
Here the dump (captured form commandline) 20201107 tcpdmup_lni_ix014_vvv.txt
Note that the connection towards media-server are not working, despite that pimd is doing something
Weird, there was nothing about 239.255.255.250 in that dump. Could you run a tcpdump while starting up pimd? When it starts up it sends out a lot of packets, including an IGMP query, that should trigger responses from end devices.
Joachim,
I made two traces
For info
[2.5.0-DEVELOPMENT][admin@pfSense.lan]/root: netstat -g
IPv4 Virtual Interface Table Vif Thresh Local-Address Remote-Address Pkts-In Pkts-Out 0 240 0.248.255.255 453448649065698 67108859 2 188 0.248.255.255 1310722 18446741875912781824 3 125 0.254.255.255 18446735277684210752 18446744071578268000 5 184 0.254.255.255 18446741874691258560 18446741874691258544 7 212 255.255.255.255 0.0.0.0 207876617823 0
IPv4 Multicast Forwarding Table is empty
IPv6 Multicast Interface Table is empty
IPv6 Multicast Forwarding Table is empty
OK, now there's relevant information in the tcpdump logfiles. Unfortunately my theory was wrong, the TTL is 4 ... :-/
As you can imagine, it's close to impossible to debug something as complex as this over email/github. So I set out to create a setup of my own, which looks something like this:
.--------.
| switch |
'--+--+--'
.-------' '-------.
| 192.168.175.0/24 |
.----+----. .----+-----.
| FreeBSD | | Alpine |
| 12.1 | | Linux |
'----+----' '-----+----'
10.0.0.0/24 | | 10.0.1.0/24
.----+----. .----+----.
| PC0 | | PC1 |
'---------' '---------'
I use my own tool, mcjoin, to send and receive plain UDP multicast. I've tried sending multicast from PC0 --> PC1 and also switching the sender to verify that it works in the other direction as well, from PC1 --> PC0, and everything works fine. (Only had some initial problems with the Alpine Linux box, until i remembered the classic rp_filter
setting in sysctl.)
Don't know what more I can do now to help you. Sorry.
Earlier today I raised the TTL from 1 (default) to 3 just to test, that did not change any thing. I just revert to 1.
If needed I can provide wireshark traces, if usefull Apart form pfSense I have other computers as well,
I do not have an opinion about the intended test setup, apart from the fact that it is 12.1 where pfSense and my development machine is on 12.2
Note that pfSense show [2.5.0-DEVELOPMENT][admin@pfSense.lan]/root: freebsd-version -kru 12.2-STABLE 12.2-STABLE 12.2-STABLE No Idea which release that is :)
My development machine is three times 12.2-RELEASE
To route anything (unicast/multicast) the TTL must be >1, otherwise the kernel stack will not forward it. I was interested in seeing if the incoming multicast was eligible for routing or not, wanted to rule out some fault scenarios.
OK, I'll upgrade my FreeBSD machine to 12.2 and test again. That's a useful exercise anyway since I use if for other purposes as well. But if I still cannot reproduce, then it's not much I can do.
The TTL in the gui was three you see 4 so I think that 1 is two :) further on I agree
I noted something strange in the logging ..... <<<
- pimd starting
- "two" messages
- pim exiting
- many many messages following ..... hue it had been exiting !!??
Have a look also in the attach log system.zip
Nov 7 12:18:21 pfSense pimd[67371]: Failed adding MFC entry src 192.168.14.15 grp 239.255.255.250 from ix0.14 to lagg0.26, lagg0.16, ix1.116, register_vif0: Invalid argument Nov 7 12:18:21 pfSense pimd[67371]: Failed adding MFC entry src 192.168.14.15 grp 239.255.255.250 from ix0.14 to lagg0.26, lagg0.16, ix1.116, register_vif0: Invalid argument Nov 7 12:18:22 pfSense php-fpm[74587]: /index.php: Successful login for user 'admin' from: 192.168.116.33 (Local Database) Nov 7 12:18:29 pfSense pimd[67371]: Failed adding MFC entry src 192.168.1.3 grp 239.255.255.250 from lagg0.16 to lagg0.26, ix0.14, ix1.116, register_vif0: Invalid argument Nov 7 12:18:29 pfSense pimd[67371]: Failed adding MFC entry src 192.168.1.3 grp 239.255.255.250 from lagg0.16 to lagg0.26, ix0.14, ix1.116, register_vif0: Invalid argument Nov 7 12:18:30 pfSense pkg-static[68637]: pfSense-repo upgraded: 2.5.0.a.20201106.0650 -> 2.5.0.a.20201106.2150 Nov 7 12:18:34 pfSense pimd[67371]: Failed adding MFC entry src 192.168.1.3 grp 239.255.255.250 from lagg0.16 to lagg0.26, ix0.14, ix1.116, register_vif0: Invalid argument Nov 7 12:18:34 pfSense pimd[67371]: Failed adding MFC entry src 192.168.1.3 grp 239.255.255.250 from lagg0.16 to lagg0.26, ix0.14, ix1.116, register_vif0: Invalid argument Nov 7 12:18:38 pfSense check_reload_status[551]: Syncing firewall Nov 7 12:18:38 pfSense pimd[67371]: pimd version 3.0-beta1 exiting. Nov 7 12:18:38 pfSense pimd[67371]: Failed removing MFC entry src 192.168.1.3, grp 239.255.255.250: Invalid argument Nov 7 12:18:38 pfSense pimd[67371]: Failed removing MFC entry src 192.168.14.15, grp 239.255.255.250: Invalid argument Nov 7 12:18:41 pfSense pimd[86252]: pimd version 3.0-beta1 starting ... system.zip
Have upgraded my FreeBSD machine to 12.2 now, and everything still works as expected. I'm suspecting the problems you see are related to the integration with pfSense, somehow. I have no clue as to what or why.
What do you think of my finding above?
I would like to know:
I will add debug statements to check why it is exiting
The log message "pimd version 3.0-beta1 exiting" is not the last message, it is printed when pimd has received SIGTERM. So very normal shutdown sequence. It's likely pfSense that sends SIGTERM to restart pimd.
Perhaps I know, I did change the TTL, and I asume that pfSense is restarting after each configuration change :)
We had an other short discussion on max no of VIFS
--- just copied from the pfSense activity list ---- Hi Louis,
We applied the change on FreeBSD-src repository changing #define MAXVIFS 32 to 64. Did it break pimd? Xavier mentioned a new PR would be submitted for pimd part
Updated by Renato Botelho 12 days ago
Could that causing this ? Any how I will try to force max vif no to 64 and have a look what happens
I forced pimd to support 64 VIFS by changing configure.ac line 137/138 to 64. That works for git and ports.
However ..... that did not change the behavior / nor solve the problem
That is very possible. I'd like to remind you of our discussion in #172 that increasing MAXVIFS
to something >32 has only ever been tested on Linux.
Also, after trying a few times more, I realized I get a kernel crash with FreeBSD 12.2, which I never saw on 12.1:
Nov 7 18:40:24 beasdie kernel: Fatal trap 12: page fault while in kernel mode
Nov 7 18:40:24 beasdie kernel: cpuid = 0; apic id = 00
Nov 7 18:40:24 beasdie kernel: fault virtual address = 0x0
Nov 7 18:40:24 beasdie kernel: fault code = supervisor write data, page not present
Nov 7 18:40:24 beasdie kernel: instruction pointer = 0x20:0xffffffff80cba775
Nov 7 18:40:24 beasdie kernel: stack pointer = 0x28:0xfffffe0000487700
Nov 7 18:40:24 beasdie kernel: frame pointer = 0x28:0xfffffe0000487700
Nov 7 18:40:24 beasdie kernel: code segment = base 0x0, limit 0xfffff, type 0x1b
Nov 7 18:40:24 beasdie kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Nov 7 18:40:24 beasdie kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Nov 7 18:40:24 beasdie kernel: current process = 12 (irq257: virtio_pci0)
Nov 7 18:40:24 beasdie kernel: trap number = 12
Nov 7 18:40:24 beasdie kernel: panic: page fault
Nov 7 18:40:24 beasdie kernel: cpuid = 0
Nov 7 18:40:24 beasdie kernel: time = 1604770786
Nov 7 18:40:24 beasdie kernel: KDB: stack backtrace:
Nov 7 18:40:24 beasdie kernel: #0 0xffffffff80c0a8f5 at kdb_backtrace+0x65
Nov 7 18:40:24 beasdie kernel: #1 0xffffffff80bbeb1b at vpanic+0x17b
Nov 7 18:40:24 beasdie kernel: #2 0xffffffff80bbe993 at panic+0x43
Nov 7 18:40:24 beasdie kernel: #3 0xffffffff8108f911 at trap_fatal+0x391
Nov 7 18:40:24 beasdie kernel: #4 0xffffffff8108f96f at trap_pfault+0x4f
Nov 7 18:40:24 beasdie kernel: #5 0xffffffff8108efb6 at trap+0x286
Nov 7 18:40:24 beasdie kernel: #6 0xffffffff81066f38 at calltrap+0x8
Nov 7 18:40:24 beasdie kernel: #7 0xffffffff80cc6abb at if_simloop+0xdb
Nov 7 18:40:24 beasdie kernel: #8 0xffffffff8247f8e9 at pim_input+0x409
Nov 7 18:40:24 beasdie kernel: #9 0xffffffff80d461c1 at encap_input+0xd1
Nov 7 18:40:24 beasdie kernel: #10 0xffffffff80d460b8 at encap4_input+0x28
Nov 7 18:40:24 beasdie kernel: #11 0xffffffff80d48ab8 at ip_input+0x138
Nov 7 18:40:24 beasdie kernel: #12 0xffffffff80cdfbba at netisr_dispatch_src+0xca
Nov 7 18:40:24 beasdie kernel: #13 0xffffffff80cc2d98 at ether_demux+0x138
Nov 7 18:40:24 beasdie kernel: #14 0xffffffff80cc404b at ether_nh_input+0x33b
Nov 7 18:40:24 beasdie kernel: #15 0xffffffff80cdfbba at netisr_dispatch_src+0xca
Nov 7 18:40:24 beasdie kernel: #16 0xffffffff80cc31bb at ether_input+0x4b
Nov 7 18:40:24 beasdie kernel: #17 0xffffffff80a13a85 at vtnet_rxq_eof+0x7a5
Nov 7 18:40:24 beasdie kernel: Uptime: 40s
Nov 7 18:40:24 beasdie kernel: Dumping 83 out of 474 MB:..20%..39%..58%..77%..96%
Nov 7 18:40:24 beasdie kernel: Dump complete
Nov 7 18:40:24 beasdie kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Nov 7 18:40:24 beasdie kernel: Rebooting...
Can you explain this? Shouldn't 12.2 be a stable release? (Rhetoric, don't answer that.)
Counting only today, I've spent 10h of my spare time over this weekend working for free on your issue. I've gotten nowhere, and my home network is now in worse shape than before. I've actually tried to reinstall FreeBSD, using12.2, since I thought the freebsd-update tool was somehow to blame for the instabilities, to no avail.
My takeaway from all this is that I should not waste more of my precious spare time on platforms I have no personal interest in. Unless someone steps up to pay me, of course, but that seems very unlikely given the 10+ years I've spent on this project so far. So unfortunately for you, and the rest of the UNIX world. I'm closing this bug report and will not be accepting new reports from non-Linux operating systems unless they show, without a doubt, the problem they have is directly related to pimd.
I just installed the latest build on pFsense (FreeBSD).
PIMD working, however I immediately noticed an abnormal high memory uses. So I looked at the logs and saw thousands of vif argument related ("identical") messages.
A couple of examples below.
I checked syslog. The messages are occuring since I did updated half an hour ago to latest pfSense build using the same PIMD-build (yesterdays master branch) as yesterday.
So something is wrong, do not know what.
Let me know if I have to check or test something.
Louis
Example messages: