troglobit / mdnsd

Jeremie Miller's original mdnsd
BSD 3-Clause "New" or "Revised" License
55 stars 35 forks source link

Support multiple IP addresses per interface #77

Open jorgensigvardsson opened 9 months ago

jorgensigvardsson commented 9 months ago

The setup:

mdnsd sends out a query for its own services, presumably to avoid conflicts. The queries have the source address 169.254.29.54:5353 and destination 224.0.0.251:5353. mdnsd somehow responds to its own query, and it says it has the IP-address 192.168.2.200. The response however has the source address 169.254.29.54:5353.

The log files are filled with messages like these:

Jan  1 00:10:56 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-167._ssh._tcp.local. for type 16, reloading config ...
Jan  1 00:10:56 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-168._ssh._tcp.local. for type 33, reloading config ...
Jan  1 00:10:56 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-168.local. for type 1, reloading config ...
Jan  1 00:10:56 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-168._ssh._tcp.local. for type 16, reloading config ...
Jan  1 00:10:57 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-169._ssh._tcp.local. for type 33, reloading config ...
Jan  1 00:10:57 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-169.local. for type 1, reloading config ...
Jan  1 00:10:57 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-169._ssh._tcp.local. for type 16, reloading config ...
Jan  1 00:10:58 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-170._ssh._tcp.local. for type 33, reloading config ...
Jan  1 00:10:58 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-170.local. for type 1, reloading config ...
Jan  1 00:10:58 ptwos-coronet-61-9f-20 mdnsd[1424]: vlan1: conlicting name detected ptwos-coronet-61-9f-20-170._ssh._tcp.local. for type 16, reloading config ...

If I drop either address from the interface, it has no problems.

jorgensigvardsson commented 9 months ago

This is really mind boggling. Tried the same setup on my PC, and mdnsd could handle multiple addresses just fine.

It's as if the multicast packets are bouncing around in my network. 🫣😵‍💫

jorgensigvardsson commented 9 months ago

Or rather, the mdns packets are sent to the network from the interface, as well back onto itself! But only on my embedded device. It's as if IP_MULTICAST_LOOP is enabled and mdnsd does not recognize the sender address (its own address).

Right now I feel like I'm rubber ducking myself on your issue page, sorry about that. You might have a clue what's going on. Either way I think I have a lead to what I should focus my debugging efforts on on Monday. 😬

troglobit commented 9 months ago

I have no idea what could be going on here. The hostname in your logs rings familiar though, knowing it's a switch-based embedded platform, are you sure the frames are not looped back from the switchcore? You can test the theory by running tcpdump on the lower-level port interfaces or inspecting the RMON counters for multicast in/out.

jorgensigvardsson commented 9 months ago

Thanks for the tip! It will be fun figuring this out!

jorgensigvardsson commented 9 months ago

So, now that it's not Friday and my brain is a bit less fried, it seems like my issue has been fixed in 0.12.

troglobit commented 9 months ago

So, now that it's not Friday and my brain is a bit less fried, it seems like my issue has been fixed in 0.12.

He he, OK :smirk:

jorgensigvardsson commented 9 months ago

I think I may have to reopen this issue. I think in fact there is a bug in mdnsd with respect to multiple IP addresses per interface. Here's my reasoning:

  1. At startup all interfaces are enumerated using getifaddrs() in iface_init()
    1. Any interface X is iterated for each address on the interface
    2. The first address for interface X is recorded, all other addresses are in effect ignored
  2. Found addresses in step 1 are attached to the iface->mdns instance (one address per interface only)
  3. In mdnsd_in() answers are processed at the end. The source address is checked against the address for the current interface's mdns instance. Because only one of the interface's addresses are recorded as "ours", any packet received for any of the other addresses on the interface, the result is a conflict, causing mdnsd to reconfigure itself
  4. After reconfiguration, the entire probe and conflict loop is started again

It's odd that the source address of the packet going out from mdnsd is the first address of the interface, but the bound address of the multicast socket seem to be the last address of the interface. I can't really comment on this behavior, or if it's a proper behavior of the system.

One solution to this particular problem would be keeping a list of all interface addresses and compare against the list in 3. Not sure if that's a proper solution though if my system is behaving badly.

Do you have any thoughts about this?

jorgensigvardsson commented 9 months ago

(I've tested and debugged this on 1.12 by the way)

jorgensigvardsson commented 9 months ago

I think that what I'm saying is that:

  1. When mdnsd starts up, it picks one address as "this is us"
  2. bind() picks one address from the interface and uses that as sender address

These addresses don't always match, and that causes mdnsd to think there are conflicts when it sees its own packets.

troglobit commented 9 months ago

Hmm, yeah that makes sense. We could do a better job of that.

Unfortunately I don't have the time (or energy) at the moment to help out. Maybe someone else who has been active recently in the project is interested?

jorgensigvardsson commented 9 months ago

I'm actually trying to fix it right now, but with limited success. Limitations set by too much caffeine. 🥴

I thought I had a solution for it, but I fail to grok the mark and sweep stuff. In my eyes, it has no effect, but I'm sure I've missed something. The mark and sweep algorithm is now also a different beast, as the changes I'm working with has changed the interface:address ratio from 1:1 to 1:n.

I'll give it a go again tomorrow once this caffeine overload has simmered...