Closed realdream closed 2 years ago
accept
policyI think I'm seeing your problem, again I'm testing this in CORE, so not exactly the same setup as you.
It is veeeeery slow to start, so definitely some problem still with latest pimd code. Possibly related to #184, but doesn't really help with that patch. As I suspected, not ready for release yet :-/
It's a bit quicker to get going if I let receiver 1 start first, and wait a couple of seconds before I start sender 1. In my case the first router (R1) sends to eth0, not pimreg, but the second router is stuck waiting for data on pimreg, while data is coming in on its eth0. So they seem to be out of sync.
Interestingly this problem seems to be isolated to PIM-SM only, with PIM-SSM, (S,G) join in 232/8 range, it works almost perfectly. At least from what I can see.
Edit: because only PIM-SM uses the register tunnel ... :roll_eyes: ... sorry for obvious comment. Good, however, that something works. Still a lot todo, graft on the mrouted changes, read up on the RFCs, before I'll get around to diving into this issue more hands-on. So anyone that can help out debug this particular one is more than welcome!
In my case the data flow ( or mroute ) is like:
BSR | | Not BSR
sender 1 ---> br0 ---> pimreg ----| |----->pimreg--->br0---> receiver 1 not working
sender 2 ---> br0 ---> enp1s0 ----| |----->enp1s0--->br0---> receiver 2 working
so in my case there is no out of sync
issue.
while pimreg is like a broken tunnel that lost everything.
In addition to disabling multicast_snooping, did you enable multicast_querier on the bridges? I've found that both are needed to route multicast through a bridge interface.
I'm not sure if I fixed this bug, but I just pushed a set of changes to the master branch that at least seems to work better for me. I can now see multicast data coming in over the pimreg interface on the RP. Maybe you can give it a spin when you have the time?
Do my testing case again with latest master(9f758e). still the same result. also tried enable/disabe querier on bridge.
sendto from xxxxx to xxxxxx: Operation not permitted
Hmm, OK. I just tried swapping the sender and receiver (should've tested that before), and now I don't get any data at all. Like you. I'll investigate this further, but cannot make any commitments since this is the last day of my vacation.
The issue with "Operation not permitted" is definitely caused by the firewall/nat. It's the kernel responding EPERM
on a sendto()
syscall. This usually happens when a (usually implicit) block rule is hit in the firewall. I recommend users of pimd to not try and run it with NAT, it wasn't designed for that. See for example issue #126 for the troubles that can ensue. Instead, use a GRE tunnel to connect sites, or GRE over IPsec, or a plain OpenVPN tunnel.
Yes. "Operation not permitted" is caused by NAT, I also tried to disable NAT, got pimd[106960]: find_route: Not a valid host (0.0.0.0) ...
at commit b41fb72, but seems no langer exist in commit 9f758e. however pimdreg broken issue is still there
I'll make an effort during the weekend to hunt this one down.
This took a lot longer to get back to than I anticipated. I've now set up a few automated tests to easier reproduce issues like this, and in the first tests to actually require an RP I ran into the same issue. Then it struck me like a ton of bricks, rp_filter
!
The reason, it seems, pimd worked better for me in the past is that I ran an older version of Ubuntu back then and since they've changed their defaults in cat /etc/sysctl.d/10-network-security.conf
to enable rp_filter=2
, i.e. "loose" mode. Even though "loose" is better than "strict" mode, it doesn't really help decapsulated traffic that comes in on pimreg when there's no reverse-path to the (encapsulated) source IP. Linux happily drops the packet in skb_tunnel_rx() ...
Only way around it, that I can see with pimd, is to disable rp_filter on all interfaces used for multicast routing. In my tests (will push later tonight CET), this is what I've done and had successful results with.
There, finally works on someone eleses computer as well https://github.com/troglobit/pimd/actions/runs/1255356383 tests are available in the new test/
subdirectory, maybe not entirely readable shell script, apologies.
Closing issue. There is now a Troubleshooting Checklist that mentions rp_filter
test env
host system: ubuntu 20.04 vhost system: ubuntu 20.04 pimd version: build from src b41fb72156d (latest master by now)
minimum test topology
modified pimd configure
abnormal phenomena
receiver 1 did not receive any message from sender 1 while
clues
when only run sender 1 & receiver 1
if enable snat for enp1s0 will get another warning
from host2