project-chip / rs-matter

Rust implementation of the Matter protocol. Status: Experimental
Apache License 2.0
323 stars 45 forks source link

mDNS: update domain lib; fix various issues #186

Closed ivmarkov closed 3 months ago

ivmarkov commented 3 months ago

I started looking at the mDNS and DNS-SD RFCs recently w.r.t. what it would take to implement the "query" portion of mDNS in our built-in pure-Rust responder. (So that we can initiate brand-new sessions for subscriptions, or update existing sessions to their new peer IP address, in case such a change is detected.)

What a can of worms I opened...

While the mDNS and DNS-SD specs look deceivingly simple (great work by the author BTW as in the specs are super easy to read), and while the domain lib does the heavy-lifting of DNS record parsing / creation for us... these specs are actually complex in that there is a lot of complexity hidden in there not in terms of DNS record parsing/creation, but how an mDNS responder/queryer should react in various circumstances (as in - when to accept unicast requests; when to generate delays before broadcasting; what additional data to put in the mDNS record; and the list goes on...)

Anyway, to cut the long story short, this PR addresses the following deficiencies we have in our current mDNS impl (in severity order):

Since the PR was ending up large-ish, I also upgraded domain to the next version (0.10). This increased our Rust MSRV from 1.77 to 1.78. Upgrading got rid of a potential future large refactoring (domain lib authors renamed the notion of "dname" to just "name" everywhere) and also we could get rid of the heapless 0.7 "appendix" dependency that we no longer need. And from the "octets" crate which is now re-exported by the new "domain" lib.

So yes, I think this should go in.

Are we 100% compliant yet after this PR?

Far from it. I also think we might never get fully compliant with mDNS and DNS-SD. But... we just need to be "compliant enough" and not generate too much "network noise". I have my doubts that other MCU-based mDNS implementations are very compliant either (due to code complexity yet limited flash size).

ivmarkov commented 3 months ago

@kedars If that's not clear ^^^ does not yet implement the "query" portion. But it tries to fix the pre-existing "responder" logic to operate well-enough, so that I hopefully finally get rid of the mDNS lookup timeouts I see from time to time in my own setup over here - in the Commissioner...

ivmarkov commented 3 months ago

Are you considering the built-in implementation for just quick-developer use cases or production options too?

Great question. We must bring it to a state where it can be used in production. Reason being, "true" baremetal top-to-bottom Rust MCU stacks like Embassy, and the stuff the baremetal Rust folks at Espressif do with esp-hal (which is essentially integrating with Embassy, in the lack of any other task scheduler) do not have any suitable mDNS implementation in-place.

So if rs-matter is to be used in future on these stacks, we need to have an mDNS implementation that runs there.

If you intend to continue to evolve the built-in implementation:

  • I wonder if we should spit it out into a separate project since it will get complicated, and also so that it has a wider applicability, and a better chance of surfacing and fixing issues.

Also a great question. What I'm doing in the meantime, is to maintain a "copy" of our "built-in" mDNS implementation as a stand-alone library here, as part of the edge-net suite of protocols.

The implementation in edge-mdns is almost a verbatim copy of the rs-matter one. In a way, if you need - on your device - a more generic mDNS responder (and queryer in future) - as you would be running also other mDNS workloads besides the Mater ones, you should be using that library or a similar one with MdnsType::UserProvided in rs-matter.

Reason why I'm not actively pushing for inclusion of this library as a dependency in rs-matter:

So for the short to mid-term - as painful as it is for me - I think we should keep our own mDNS implementation in-tree in rs-matter. Once (or if) the edge-net crates and edge-mdns becomes somewhat popular, we might switch.

  • There is a Bonjour Conformance Test (BCT), available for download on Apple's site (require's Apple developer account) https://developer.apple.com/bonjour/. It also has chattiness tests across a few hours to ensure we aren't creating large bursts. I doubt if many of the current implementations even pass this though.

Great idea! Let me look into it (but possibly after merging the BTP PR, as it is next on my list).

ivmarkov commented 3 months ago

BTW one more reason why we need e2e tests - after my changes, the code was supposed to send to the multicast addresses. Yet - due to a single-line bug - it was still sending to the unicast address.

Fixed now (and re-tested), but yes I think we do need e2e - it would be not impossible but difficult to catch such type of errors with unit tests.