keepsimple1 / mdns-sd

Rust library for mDNS based Service Discovery
Apache License 2.0
96 stars 37 forks source link

Services published using python-zerconf or systemd-resolved are not resolved #182

Closed hrzlgnm closed 6 months ago

hrzlgnm commented 6 months ago

Example python code to publish a service:

from zeroconf import Zeroconf, ServiceInfo
from socket import gethostname

zeroconf = Zeroconf()
service_type = "_workstation._tcp.local."

zeroconf.register_service(
    ServiceInfo(
        service_type,
        f"worky-station.{service_type}",
        4848,
        server=f"{gethostname()}.local",
    )
)

try:
    input("Press enter to exit...\n\n")
finally:
    zeroconf.close()

Running avahi-browse -tpr _workstation._tcp on the same Linux machine yields resolved results

+;eno33554984;IPv4;worky-station;Workstation;local
+;eno16777736;IPv4;worky-station;Workstation;local
+;lo;IPv4;worky-station;Workstation;local
=;eno33554984;IPv4;worky-station;Workstation;local;void-vm.local;192.168.178.76;4848;
=;eno16777736;IPv4;worky-station;Workstation;local;void-vm.local;192.168.73.130;4848;
=;lo;IPv4;worky-station;Workstation;local;void-vm.local;127.0.0.1;4848;

Example program i used to test resolving using mdns-sd

use mdns_sd::{ServiceDaemon, ServiceEvent};

fn main() {
    let mdns = ServiceDaemon::new().expect("Failed to create daemon");
    let receiver = mdns.browse("_workstation._tcp.local.").expect("Failed to browse");
    let mut search_done = false;
    while let Ok(event) = receiver.recv() {
        match event {
            ServiceEvent::ServiceResolved(info) => {
                println!(
                    "Resolved a new service: {} host: {} port: {} IP: {:?} TXT properties: {:?}",
                    info.get_fullname(),
                    info.get_hostname(),
                    info.get_port(),
                    info.get_addresses(),
                    info.get_properties(),
                );
            }
            ServiceEvent::SearchStarted(_service) => {
                if search_done {
                    mdns.stop_browse(srv).expect("To stop browsing");
                }
                search_done = true;
            }
            ServiceEvent::SearchStopped(_service) => {
                break;
            }
            _ => {}
        }
    }
    mdns.shutdown().unwrap();
}

What I also noticed, when I use avahi-publish-service worky-station _workstation._tcp. 4848, it can be resolved successfully using the obove rust program example.

keepsimple1 commented 6 months ago

I tried it locally. It seems that Python zeroconf did not send / respond with address records (TYPE_A or TYPE_AAAA). I haven't got chance to find out if / how avahi-browse used other means to resolve the address for the host name.

The example query program in mdns-sd shows it found the instance, but couldn't resolve it fully (due to missing address records)

$ cargo run --example query _workstation._tcp
   Compiling mdns-sd v0.10.4 (/Users/hanxu/work/mdns-sd)
    Finished dev [unoptimized + debuginfo] target(s) in 3.29s
     Running `target/debug/examples/query _workstation._tcp`
At 191.994µs : SearchStarted("_workstation._tcp.local. on addrs [192.168.0.108, fe80::1, fe80::f884:fdff:fe05:b1ff, fe80::e20:39ca:3827:464, fe80::f071:231b:9e7:14fa, fe80::8646:3c7b:acfb:2d5c, fe80::10c7:e8af:585a:448f, fe80::8949:ca7e:9b05:10c4, fe80::ce81:b1c:bd2c:69e]")
At 110.160432ms : ServiceFound("_workstation._tcp.local.", "worky-station._workstation._tcp.local.")
<snip>
keepsimple1 commented 6 months ago

I opened a PR #183 with some debugging code to find that python zeroconf actually included a NSEC record that shows the lack of IPv4 and IPv6 addresses.

hrzlgnm commented 6 months ago

My guess is, avahi is using the associated server name from the SRV record to resolve addresses when A and AAAA records are not present.

hrzlgnm commented 6 months ago

Out of interest i grepped a bit through the code of avahi, and it seems that the naem in TYPE_SRV record is used to resolve the TYPE_A or TYPE_AAAA address: See https://github.com/avahi/avahi/blob/v0.8/avahi-core/resolve-service.c#L221 and following

keepsimple1 commented 6 months ago

yes I suspected the same. And I updated the PR #183 to use regular lookups (std::net) to resolve the address if we detect no address and NSEC record shows the instance explicitly says they don't have the addresses.

In my testing, the PR's patch is able to resolve your original python zeroconf instance:

run $ cargo run --example query _workstation._tcp :

At 238.854773ms: Resolved a new service: worky-station._workstation._tcp.local. host: MBP-9.local. port: 4848 IP: {127.0.0.1, fe80::10c7:e8af:585a:448f, fe80::1, ::1, 192.168.0.108} TXT properties: TxtProperties { properties: [] }

P.S. there is one potential issue with my current patch: to_socket_addrs is a blocking call, hence causing delays if the hostname lookup fails. I'm trying to find optimizations. But let me know if the current patch works for you or not. Thanks.

hrzlgnm commented 6 months ago

Thanks for looking into this, i tried out the debug-resolve branch, it seems to work on windows only for me. Unfortunately it does not work on Linux for me. I guess we need to send those queries via mDNS also.

PS: I don't have a Mac.

hrzlgnm commented 6 months ago

I can provide network traces of avahi resolving this, if you like.

hrzlgnm commented 6 months ago

avahi-resolve.zip Here is a network trace of the case where avahi resolves the service running on Linux

hrzlgnm commented 6 months ago

I've created a minor pr to you #183 branch where it works for linux, test on windows to be done soon (tm)

hrzlgnm commented 6 months ago

When i run this on linux, with the python program on the same host I get:

At 120.686µs : SearchStarted("_workstation._tcp.local. on addrs [fe80::f141:2ded:3ab3:9970, 192.168.122.79]")
At 112.859681ms : ServiceFound("_workstation._tcp.local.", "worky-station._workstation._tcp.local.")
At 113.153913ms: Resolved a new service: worky-station._workstation._tcp.local. host: void-vm.local. port: 4848 IP: {192.168.122.79} TXT properties: TxtProperties { properties: [] }
At 113.167118ms: Resolved a new service: worky-station._workstation._tcp.local. host: void-vm.local. port: 4848 IP: {fe80::f141:2ded:3ab3:9970, 192.168.122.79} TXT properties: TxtProperties { properties: [] }
hrzlgnm commented 6 months ago

Also got results on windows while the program was running in a linux vm:

At 186µs : SearchStarted("_workstation._tcp.local. on addrs [192.168.178.25, 192.168.49.1, fe80::2b19:d118:3bdc:d9e8, 2003:e8:bf3e:3f00:e073:5eb2:2493:93a, fe80::e5b8:2dca:ee23:5df4, 2003:e8:bf3e:3f00:17e2:ac9b:ef52:7729, fe80::f900:996d:50d1:8349, 192.168.73.1]")
At 1.4355ms : ServiceFound("_workstation._tcp.local.", "homeassistant [07ec3e0c8c864037bbd53d1ef63a9d3c]._workstation._tcp.local.")
At 66.9015ms : ServiceFound("_workstation._tcp.local.", "vm-worky-station._workstation._tcp.local.")
At 68.5139ms: Resolved a new service: vm-worky-station._workstation._tcp.local. host: void-vm.local. port: 4848 IP: {192.168.178.76} TXT properties: TxtProperties { properties: [] }
At 68.6539ms: Resolved a new service: vm-worky-station._workstation._tcp.local. host: void-vm.local. port: 4848 IP: {192.168.178.76, 192.168.73.130} TXT properties: TxtProperties { properties: [] }
At 69.0566ms: Resolved a new service: vm-worky-station._workstation._tcp.local. host: void-vm.local. port: 4848 IP: {192.168.73.130, 192.168.178.76, 2003:e8:bf3e:3f00:938d:8ea8:42c0:f758} TXT properties: TxtProperties { properties: [] }

PS: ignore the homeasistant [...] thing there, that's probably a bug in a homeassistant addon advertising internal docker addresse which cannot be reached anyway...

hrzlgnm commented 6 months ago

Actually I was wrong about this "homeassistant [...]", the home assistant operating system publishes this record using systemd-resolved since https://github.com/home-assistant/operating-system/commit/25a0dd30823e5863b4d90f2556d05149099be864 And sending only mulicast queries doesn't seem be enough. Avahi is also able to resolve those, and sends both a Multicast and a Unicast query simultaneously, as can be seen above in the network trace for case for the python-zerconf test. Perhaps sending a Mulicast Query alone is not enough.

hrzlgnm commented 6 months ago

I've updated my pr #184 once again which entirely skips the NSEC check and now i'm also able to resolve the

At 503.610675ms: Resolved a new service: homeassistant [07ec3e0c8c864037bbd53d1ef63a9d3c]._workstation._tcp.local. host: homeassistant.local. port: 0 IP: {192.168.178.70} TXT properties: TxtProperties { properties: [] }
hrzlgnm commented 6 months ago

I first was wondering why it didn't work first, your branch was missing the fix from #181

hrzlgnm commented 6 months ago

Tested my fix on windows, and it also works there:

At 2.6834ms : ServiceFound("_workstation._tcp.local.", "homeassistant [07ec3e0c8c864037bbd53d1ef63a9d3c]._workstation._tcp.local.") 
At 510.0461ms: Resolved a new service: homeassistant [07ec3e0c8c864037bbd53d1ef63a9d3c]._workstation._tcp.local. host: homeassistant.local. port: 0 IP: {192.168.178.70} TXT properties: TxtProperties { properties: [] }
hrzlgnm commented 6 months ago

@keepsimple1 If you like I can submit a PR only containing the actual fix that works for me without your debuggin NSEC changes.

keepsimple1 commented 6 months ago

@keepsimple1 If you like I can submit a PR only containing the actual fix that works for me without your debuggin NSEC changes.

yes, that will be great! Thanks!