Closed dpetzel closed 8 years ago
Hi @dpetzel, interesting use-case for pimd
! I'm a bit lost in the Docker world and your setup as well, an (ascii) image might have helped. I'm trying one out here:
eth0
>--- MC sender ----{ Network cloud }-------> [ Server host ] <--- router
|
________|________
/ docker0 \ <--- bridge ______
/ | \ | | <--- MC receiver
__________/ Container ship \_______________|______|_____
\ | / /
\ `------------------>-------' /
\________________________________________________________/
Now, there are many levels of multicast where things go wrong. You seem humble enough and knowledgeable enough about the basics, so we should be good to go :wink:
pimd
to even consider installing a multicast route from eth0
to docker0
it needs:
a) To hear the client respond to an IGMP query it sent out (layer-2), and
b) To actually have the desired multicast on eth0
, or know of an upstream PIM-SM router (rendez-vous point) that has it to send a PIM join to (layer-3)However, multicast routing takes place not on regular interfaces, but on "VIF"'s ... virtual interfaces. These are unique to multicast and are enumerated when the multicast routing daemon (smcroute/pimd/mrouted) starts up. In the case of pimd
it is a bit picky when allowing an interface to be enumerated as a VIF, it's not just the MULTICAST
interface flag, which one might believe. Check the output from cat /proc/net/ip_mr_vif
to make sure a VIF has been created for docker0
, as well as eth0
.
You are much more skilled in ascii art than myself!! Your diagram is spot on. Thank you so much for taking the time to talk through this.
I apologize for not including earlier, I am using a TTL of 64
Here is the output of /proc/net/ip_mr_vif
# cat /proc/net/ip_mr_vif
Interface BytesIn PktsIn BytesOut PktsOut Flags Local Remote
0 eth0 4041586 34814 0 0 00000 6840C70A 00000000
1 docker0 0 0 1789217 14660 00000 010017AC 00000000
2 pimreg 0 0 0 0 00004 6840C70A 00000000
Additionally I lifted what I believe are relevant log entries showing it setting up the VIF: This is during startup
14:21:18.208 Installing eth0 (10.0.64.104 on subnet 10.0.64/22) as vif #0-2 - rate 0
14:21:18.208 Installing docker0 (172.10.0.1 on subnet 172.10) as vif #1-3 - rate 0
14:21:18.208 Getting vifs from /etc/pimd.conf
14:21:18.208 Local Cand-BSR address 172.10.0.1, priority 5
14:21:18.208 Local Cand-RP address 172.10.0.1, priority 20, interval 30 sec
I see these at random times as pimd
is running:
23:26:59.782 accept_group_report(): igmp_src 172.17.0.128 ssm_src 0.0.0.0 group 239.192.12.200 report_type 34
23:26:59.782 Set delete timer for group: 239.192.12.200
23:26:59.782 Adding vif 1 for group 239.192.12.200
a) To hear the client respond to an IGMP query it sent out (layer-2),
I think that is what I'm seeing in the second log snippet? If not whats the best way for me to confirm or deny that is happening?
b) To actually have the desired multicast on eth0, or know of an upstream PIM-SM router (rendez-vous point) that has it to send a PIM join to (layer-3)
I feel like I've confirmed this is happening via the observed behavior using tcpdump on eth0
Since I can see the packets coming in when pimd
is running, and nothing when its not.
Since you've mentioned a lack of familiarity with Docker, I'll toss out that it sets up the following iptables rules. I haven't seen anything to suggest these are at fault, but just wanted to get you the information in case it matters. I am no iptables wizard by any stretch, so its entirely possible something could be wrong there and I just don't see it. That said given that smcroute and igmpproxy work, I'm inclined to think the issue is not in the iptables configuration.
# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j DROP
# iptables -S -t nat
-P PREROUTING ACCEPT
-P POSTROUTING ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
Side question, The main driver for me for pimd over igmpproxy (or smcroute) was the limit of 20 groups, however I just stumbled across this http://unix.stackexchange.com/questions/23832/is-there-a-way-to-increase-the-20-multicast-group-limit-per-socket which suggests 20 is the default but its configurable.
From scmroute:
Only 20 mgroup lines can be configured, this is a HARD kernel maximum. If you need more, you probably need to find another way of forwarding multicast to your router.
So I'm confused on if this is really a hard kernel max, or am I simply misreading the limitation here. I also found https://groups.google.com/forum/#!topic/linux.kernel/QeiadoMEdWY which implies it maybe configurable.
Hi again,
does the command pimd -r
show the routing tables for you? It should show a route being set up, or at least some useful info. You can also verify what routes are actually written to the kernel in the file /proc/net/ip_mr_cache
... the more readable version can be seen using the ip mroute
tool. This latter part (two steps) are also shared with SMCRoute.
Verifying IGMP join from your docker client/receiver is easier to do with Wireshark or tcpdump on the docker0 interface, I guess.
I'm a bit involved in SMCRoute as well, and I can tell you that the 20 group MAX is only just for the daemon to act as a layer-2 client, sending IGMP join on behalf of the client on "the other side". If you can direct all multicast to your router using other means, e.g. setting "router port" or similar on switches or routers on the eth0 side, then you won't need the SMCRoute mgroup
rows. Thanks
for the heads-up on that /proc
variable didn't know about that, so I've just updated the SMCRoute sources a bit! :smiley: :+1:
I don't really know what to make of it, but I suspect you do, here is the
section for the group in question from pimd -r
----------------------------------- (S,G) ------------------------------------
----------------------------------- (*,G) ------------------------------------
Source Group RP Address Flags
--------------- --------------- --------------- ---------------------------
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
Joined oifs: ...
Pruned oifs: ...
Leaves oifs: .l.
Asserted oifs: ...
Outgoing oifs: .o.
Incoming : ..I
TIMERS: Entry JP RS Assert VIFS: 0 1 2
0 20 0 0 0 0 0
----------------------------------- (S,G)
I believe I do see the igmp join occur in tcpdump:
# tcpdump -s0 -i docker0 -vv -XX igmp
11:11:41.527227 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
172.17.0.139 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 239.192.12.200 to_ex { }]
ip mroute | grep 12.200
comes up dry
Here is the full table (cleaned up a little for internal info)
ip mroute
(REMOTE_SENDER1, GROUP_IP1) Iif: eth0 Oifs: docker0
(REMOTE_SENDER1, GROUP_IP2) Iif: eth0 Oifs: docker0
(REMOTE_SENDER2, GROUP_IP3) Iif: eth0 Oifs: docker0
(REMOTE_SENDER3, GROUP_IP4) Iif: eth0 Oifs: docker0
(REMOTE_SENDER3, GROUP_IP5) Iif: eth0 Oifs: docker0
(REMOTE_SENDER1, GROUP_IP6) Iif: eth0 Oifs: docker0
(REMOTE_SENDER4, GROUP_IP7) Iif: unresolved
(REMOTE_SENDER5, GROUP_IP8) Iif: unresolved
(REMOTE_SENDER6, GROUP_IP9) Iif: unresolved
(REMOTE_SENDER7, GROUP_IP10) Iif: unresolved
(REMOTE_SENDER8, GROUP_IP11) Iif: unresolved
(REMOTE_SENDER9, GROUP_IP8) Iif: unresolved
(REMOTE_SENDER10, GROUP_IP9) Iif: unresolved
(REMOTE_SENDER11, GROUP_IP7) Iif: unresolved
(REMOTE_SENDER3, GROUP_IP12) Iif: unresolved
(REMOTE_SENDER12, GROUP_IP9) Iif: unresolved
If its at all helpful...
# sysctl -a | grep mc_forward
net.ipv4.conf.all.mc_forwarding = 1
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.eth0.mc_forwarding = 1
net.ipv4.conf.docker0.mc_forwarding = 1
net.ipv4.conf.veth#####.mc_forwarding = 0
net.ipv4.conf.pimreg.mc_forwarding = 1
net.ipv4.conf.vethfd#####.mc_forwarding = 0
Hmm, that's just weird ... there should be a routing rule for the 12.200 group. There's one thing that may screw up things, and that's conntrack. When you use smcroute it knows pre-runtime what rules you want, so before traffic enters the router it has a multicast route set up. With pimd it takes a while to figure out what receivers exist before it installs a route, so the firewall may drop incoming traffic. Try flushing conntrack after starting pimd, a few times ...
It's really difficult to debug issues like this remote, so I'm sorry that I cannot help you better. If I were in your shoes I'd double check the TTL on the inbound multicast using tcpdump on eth0.
Maybe I should try out this new fancy docky thingy, there may be something with the docker0 interface that pimd does differently from smcroute, which needs adaptation, dunno ...
I totally understand how hard these things can be remotely, and I appreciate the time you have already spent.
Confirmed the incoming TTL
# tcpdump -s0 -i eth0 -vv -XX host 239.192.12.200
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:46:21.079540 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 31)
REMOTE_SENDER.6000 > 239.192.12.200.6000: [udp sum ok] UDP, length 3
Flushing conntrack conntrack -F
a few times (waited about 60 seconds between flushes)
doesn't seem to do the trick.
As a test, I did a service iptables stop
and then restarted pimd
and
my listener. Still no entry listed in ip mroute
Unsure if any of this is useful, but in case it is I did a grep on 12.200 from the output of pimd -d
:
19:50:33.359 accept_group_report(): igmp_src 172.17.0.142 ssm_src 0.0.0.0 group 239.192.12.200 report_type 34
19:50:33.359 Set delete timer for group: 239.192.12.200
19:50:33.359 SM group order from 172.17.0.142 (*,239.192.12.200)
19:50:33.359 create group entry, group 239.192.12.200
19:50:46.779 accept_group_report(): igmp_src 172.17.0.142 ssm_src 0.0.0.0 group 239.192.12.200 report_type 34
19:50:46.779 Set delete timer for group: 239.192.12.200
19:50:46.779 create group entry, group 239.192.12.200
19:50:46.779 Adding vif 1 for group 239.192.12.200
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
19:51:02.225 accept_group_report(): igmp_src 172.17.0.142 ssm_src 0.0.0.0 group 239.192.12.200 report_type 34
19:51:02.225 Set delete timer for group: 239.192.12.200
19:51:02.225 Adding vif 1 for group 239.192.12.200
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
19:51:20.136 accept_group_report(): igmp_src 172.17.0.142 ssm_src 0.0.0.0 group 239.192.12.200 report_type 34
19:51:20.136 Set delete timer for group: 239.192.12.200
19:51:20.136 Adding vif 1 for group 239.192.12.200
INADDR_ANY 239.192.12.200 172.17.0.1 WC RP
Is there a particular logging statement around creation of the route?
OK, TTL looks fine ... never hurts to double check that :smirk:
No firewall problems and no conntracking issues, I'm at a loss. Just tested a setup on my laptop, using a multicast sender in Qemu, connected to my latop host on virbr0, I used ping -I eth0 -t 5 225.1.2.3
inside my Qemu and then started my own mcjoin -i eth0 225.1.2.3
tool on my host's eth0 (sorry for confusing with same name!). When I then started pimd on my host I could see the ICMP frames with tcpdump on my host's eth0 after a short while. The log says:
Added kernel MFC entry src 192.168.123.110 grp 225.1.2.3 from virbr0 to eth0
My mcjoin tool is a simple IP multicast receiver https://github.com/troglobit/toolbox/tree/master/mcjoin setup looks like this:
Laptop host Qemu sender
__________________ _______________
| tcpdump | |
| mcjoin pimd | |
| | | |
| | | |
| V | |
| eth0 virbr0===eth0 <-- ping |
|__________________|_______________|
Very strange indeed. It can clearly create routes as it did for some of the groups, but not all (specifically this one I've been testing with). It never logs the MFC entry for this group for some reason.... I've been running strace for a little bit now hoping something might jump out, but so far nothing useful
There are some fixes in the pipeline for the next release, already on master. If you're a brave soul you could try building the GIT sources:
git clone https://github.com/troglobit/pimd.git
cd pimd
git submodule update --init
./configure && make
I'm terribly sorry I cannot be of any more help! :disappointed:
Well... I was brave enough (pimd version 2.3.2-rc2 starting
). Sadly its the same behavior though its simply not adding the route entry. Its baffling how some groups are getting routes but this one is not.
No need to apologize, you have been extremely helpful. Even if nothing else comes of this I have learned a ton in the process.
The good news, is since learning I can have do more than 20 groups, igmpproxy
works for my use cases, but pimd
seems to be a much more maintained project.
Very unfortunate, thank you for giving it a go anyway! I tried 239.192.12.200 in my setup and there it works, so I really don't know what could be the problem ... unless ...
... maybe the receiver does send a join, but then quickly sends a leave? Dunno, that's a loooong shot, but analyzing the tcpdump-log on the host for the IGMP traffic of that group might show sth?
You could give mrouted
a try as well, if you've got any strength left that is. I maintain that too, along with smcroute, and DVMRP (the protocol mrouted uses) is waay more simple (built-in RIP-like) and uses a flood-and-prune method instead. So it might be better for "world --> docker" deployments.
It's not as polished and capable as pimd, but at least you won't have to mess with static routes.
(Much the same build system as I've set up for pimd.)
I had similar issues when I tried mrouted as well, but I'm probably gonna give it another go with what I've learned from this discussion.
@dpetzel Hi again! Just couldn't let this one go, kept on nagging me that we couldn't get it working ...
So I fixed up my own tool, verified it outside of Docker and then used it as sink for 250 groups in a container. Went without a hitch.
I don't know if you gave up, or went with igmpproxy
instead. Anyway, this may be too late and I'll likely close this issue before the next release. Just wanted to let you know.
Cheers!
Hey @troglobit Thanks for the follow up and apologies for any lost sleep I have caused :(. For right now, igmpproxy is fulfilling our 95% use cases, but I don't rule out we'll need to circle back and revisit PIMD so I really appreciate your right up. Its good to know that it can work, and that we just have something off in our configuration somewhere. I don't see any reason to keep this open.
Great to hear back from you, @dpetzel, hope you circle back one day and good luck! :-)
Sorry if this is the wrong place for this type of question, I didn't see anything more specific in the README, or at http://troglobit.github.io/pimd.html.
We have an existing network of applications that leverage multicast. Our network is maintained by another team and I don't have much visibility into, but I do know we historically relied on IGMP snooping. I'll be the first to admit I'm out of normal element in this space, so if helps assume I have a basic understanding of multicast and IGMP, and no knowledge of PIM aside from what I've read over the last couple days trying to get this to work.
We are moving some of these multicast dependent applications into Docker. Our docker hosts consists of a single
eth0
attached to upstream switches, and the localdocker0
bridge. We are not doing anything with overlay networks at this time. As a byproduct of installing docker, ip_forwarding is enabled (in case it matters)We'd like to use
pimd
to route multicast packets from the upstream networking into the local network on the docker bridge. We've got this mostly working but I'm hung up at the very end.I've been using jgroups to test as outlined here: http://linuxproblems.org/wiki/How_to_check_Multicasting.
eth0
eth0
shows the packets coming in, they just don't seem to be getting forwarded/routed to the container.I believe that its a good sign that I see the packets on
eth0
as that suggests PIMD has negotiated (peered?) with the upstream switch and packets are getting routed across the network.I did fine the note in http://troglobit.github.io/multicast-howto.html, about disabling
multicast_snooping
on thedocker0
bridge, however that doesn't appear to have helped.To test some basic assumptions I tested with smcroute as well as igmpproxy and I had success with both of those tools, however since they are both limited to 20 groups, and I have a small subset of use cases which might require more groups than that, I'd really like to get PIMD working.
It feels like I maybe one minor configuration tweak away from success, but I've hit a brick wall and hoping someone may have done this already or have some suggestions.
Thanks, and if there is a better place for this type of question I'm happy to post something there.