keepsimple1 / mdns-sd

Rust library for mDNS based Service Discovery
Apache License 2.0
104 stars 39 forks source link

Refresh A and AAAA records of active `.browse` queriers #240

Closed hrzlgnm closed 3 months ago

hrzlgnm commented 3 months ago

I observed some resolving issues, when one stops a browse for a service after "ttl" expires and starts the same browse again. The info is never resolved, as the the query_unresolved process believes there are addresses cached, but actually the records are just empty.

Without this change and a responder that doesn't eagerly publish it's A / AAAA records, i always get a sad 🐼at the end. Using the the patched example query.rs like this. With the changes, the 🐼 is always happy.

diff --git a/examples/query.rs b/examples/query.rs
index 30339e6..8fc1425 100644
--- a/examples/query.rs
+++ b/examples/query.rs
@@ -13,6 +13,8 @@
 //!
 //! Keeps listening for new events.

+use std::time::Duration;
+
 use mdns_sd::{ServiceDaemon, ServiceEvent};

 fn main() {
@@ -51,8 +53,48 @@ fn main() {
                     println!(" Property: {}", prop);
                 }
             }
+            ServiceEvent::SearchStopped(_) => {
+                break;
+            }
             other_event => {
                 println!("At {:?} : {:?}", now.elapsed(), &other_event);
+                if now.elapsed() > Duration::from_secs(122) {
+                    mdns.stop_browse(&service_type).expect("To stop browsing");
+                }
+            }
+        }
+    }
+
+    let receiver2 = mdns.browse(&service_type).expect("Failed to browse 2");
+
+    while let Ok(event) = receiver2.recv() {
+        match event {
+            ServiceEvent::ServiceResolved(info) => {
+                println!(
+                    "2 At {:?}: Resolved a new service: {}\n host: {}\n port: {}",
+                    now.elapsed(),
+                    info.get_fullname(),
+                    info.get_hostname(),
+                    info.get_port(),
+                );
+                for addr in info.get_addresses().iter() {
+                    println!(" Address: {}", addr);
+                }
+                for prop in info.get_properties().iter() {
+                    println!(" Property: {}", prop);
+                }
+                println!("Did resolve, happy 🐼");
+                mdns.stop_browse(&service_type).expect("To stop browsing 2");
+            }
+            ServiceEvent::SearchStopped(_) => {
+                break;
+            }
+            other_event => {
+                if now.elapsed() > Duration::from_secs(130) {
+                    eprintln!("Did not resolve, sad 🐼");
+                    break;
+                }
+                println!("At {:?} : {:?}", now.elapsed(), &other_event);
             }
         }
     }
hrzlgnm commented 3 months ago

An alternative aproach: we could remove the cache entry entirely instead, when it's records become empty to make the is_none() check happy.

hrzlgnm commented 3 months ago

Alternative approach number 2: we could also refresh address records before those expire. Otherwise we would always publish a new ServiceResolved to the receiver every 120s, assuming the default TTLs are used.

keepsimple1 commented 3 months ago

Good catch! We actually have implemented the cache refresh logic that should handle such case. I think the problem is we sent the wrong qtype for instance SRV refresh query. We have a logic in place that automatically attach TYPE_A/TYPE_AAAA records for SRV query.

Could you try out this patch:

diff --git a/src/service_daemon.rs b/src/service_daemon.rs
index 2b332a6..6414f30 100644
--- a/src/service_daemon.rs
+++ b/src/service_daemon.rs
@@ -2230,7 +2230,7 @@ impl Zeroconf {
             let (instances, timers) = self.cache.refresh_due_srv(ty_domain);
             for instance in instances.iter() {
                 debug!("sending refresh query for SRV: {}", instance);
-                self.send_query(instance, TYPE_ANY);
+                self.send_query(instance, TYPE_SRV);
                 query_srv_count += 1;
             }
             new_timers.extend(timers);

I use your example code diff for a quick test, it seems helping. Let me know if it works for you. Cheers!

hrzlgnm commented 3 months ago

Good catch! We actually have implemented the cache refresh logic that should handle such case. I think the problem is we sent the wrong qtype for instance SRV refresh query. We have a logic in place that automatically attach TYPE_A/TYPE_AAAA records for SRV query.

Yes we do, but only for queries we are replying to, not the ones we are requesting.

Could you try out this patch:

diff --git a/src/service_daemon.rs b/src/service_daemon.rs
index 2b332a6..6414f30 100644
--- a/src/service_daemon.rs
+++ b/src/service_daemon.rs
@@ -2230,7 +2230,7 @@ impl Zeroconf {
             let (instances, timers) = self.cache.refresh_due_srv(ty_domain);
             for instance in instances.iter() {
                 debug!("sending refresh query for SRV: {}", instance);
-                self.send_query(instance, TYPE_ANY);
+                self.send_query(instance, TYPE_SRV);
                 query_srv_count += 1;
             }
             new_timers.extend(timers);

I use your example code diff for a quick test, it seems helping. Let me know if it works for you. Cheers!

I've tried your patch and it doesn't seem to help, i found my old example for the original issue we solved earlier which exhibits the same exact behavior where it does not work. The python snippet from https://github.com/keepsimple1/mdns-sd/issues/182 does not get resolved the 2nd time we browse with your suggested change.

@keepsimple1 does it make sense when refreshing SRV records, to also check whether the TYPE_A and TYPE_AAAA records for the matching host are also requiring a refresh?

hrzlgnm commented 3 months ago

With an updated more picky example:

diff --git a/examples/query.rs b/examples/query.rs
index 30339e6..ed97787 100644
--- a/examples/query.rs
+++ b/examples/query.rs
@@ -13,6 +13,8 @@
 //!
 //! Keeps listening for new events.

+use std::time::Duration;
+
 use mdns_sd::{ServiceDaemon, ServiceEvent};

 fn main() {
@@ -32,13 +34,13 @@ fn main() {
     // Browse for a service type.
     service_type.push_str(".local.");
     let receiver = mdns.browse(&service_type).expect("Failed to browse");
-
+    let mut resolve_count = 0;
     let now = std::time::Instant::now();
     while let Ok(event) = receiver.recv() {
         match event {
             ServiceEvent::ServiceResolved(info) => {
                 println!(
-                    "At {:?}: Resolved a new service: {}\n host: {}\n port: {}",
+                    "1 At {:?}: Resolved a new service: {}\n host: {}\n port: {}",
                     now.elapsed(),
                     info.get_fullname(),
                     info.get_hostname(),
@@ -50,8 +52,57 @@ fn main() {
                 for prop in info.get_properties().iter() {
                     println!(" Property: {}", prop);
                 }
+                resolve_count += 1;
+            }
+            ServiceEvent::SearchStopped(_) => {
+                break;
             }
             other_event => {
+                println!("1 At {:?} : {:?}", now.elapsed(), &other_event);
+                if now.elapsed() > Duration::from_secs(122) {
+                    mdns.stop_browse(&service_type).expect("To stop browsing");
+                }
+            }
+        }
+    }
+
+    if resolve_count > 1 {
+        eprintln!(
+            "First browse resolved {} times, expected once, sad 🐼",
+            resolve_count
+        );
+        std::process::exit(resolve_count);
+    }
+
+    let receiver2 = mdns.browse(&service_type).expect("Failed to browse 2");
+
+    while let Ok(event) = receiver2.recv() {
+        match event {
+            ServiceEvent::ServiceResolved(info) => {
+                println!(
+                    "2 At {:?}: Resolved a new service: {}\n host: {}\n port: {}",
+                    now.elapsed(),
+                    info.get_fullname(),
+                    info.get_hostname(),
+                    info.get_port(),
+                );
+                for addr in info.get_addresses().iter() {
+                    println!(" Address: {}", addr);
+                }
+                for prop in info.get_properties().iter() {
+                    println!(" Property: {}", prop);
+                }
+                println!("Did resolve, happy 🐼");
+                mdns.stop_browse(&service_type).expect("To stop browsing 2");
+            }
+            ServiceEvent::SearchStopped(_) => {
+                break;
+            }
+            other_event => {
+                if now.elapsed() > Duration::from_secs(130) {
+                    eprintln!("Did not resolve, sad 🐼");
+                    std::process::exit(1)
+                }
                 println!("At {:?} : {:?}", now.elapsed(), &other_event);
             }
         }

And the latest commit i got it to work without the extra is_empty check, we now also remove the cache entry entirely if the cached addresses become empty.

Perhaps we should make an integration test case out of it

keepsimple1 commented 3 months ago

@keepsimple1 does it make sense when refreshing SRV records, to also check whether the TYPE_A and TYPE_AAAA records for the matching host are also requiring a refresh?

Yes, I think it's good to refresh address records based on their own refresh timers (which mostly same as SRV records as they have the same TTL). Currently the responder includes the address records as well when replying TYPE_SRV refresh, but it looks like not always working. I will try out using your example.

hrzlgnm commented 3 months ago

@keepsimple1 does it make sense when refreshing SRV records, to also check whether the TYPE_A and TYPE_AAAA records for the matching host are also requiring a refresh?

Yes, I think it's good to refresh address records based on their own refresh timers (which mostly same as SRV records as they have the same TTL). Currently the responder includes the address records as well when replying TYPE_SRV refresh, but it looks like not always working. I will try out using your example.

I already added the refresh with my latest changes and it works fine for me so far, even on windows and also on android nowadays :)