Open AlanLCollins opened 1 year ago
I would suggest specifying a shorter SRP lease time. I believe the current Matter implementation does not specify a value and relies on the default SRP lease time.
including that BR is advertising both _udp and _tcp records over wifi.
Why are there _udp records? Does the device support extended commissionable discovery? What do those records look like, and in particular, do they have CM=1 or CM=0? @AlanLCollins
close the commisioning window as soon as the CommissioningComplete command is received using General Commissioning Cluster.
Yes, we do that. I thought we were supposed to withdraw the now-stale DNS-SD records when that happens. @Damian-Nordic ?
and/or modify the TTL for _udp record to match the time for the commisioning window.
See also https://github.com/project-chip/connectedhomeip/issues/16858.
Hello @bzbarsky-apple , the record has CM=0. See sniffer log attached. Nwk key = f294a6ccd5e88a0b0df947dec263a385 02-09-Matter_provissioning-Golden.pcapng.zip What is the expected behavior of the CM flag? I was expecting to see it CM=1, and after CommissioningComplete command is received, light-bulb to send DNS update with CM=0. Does the CM flag best behavior requires specific commit from the CHIP SDK ?
The _udp record is withdrawn after 15m, the light-bulb sends DNS update with TTL = 0s.
Here's another idea for a solution. When a Thread devices dis-associates from the network, it sends a "I am leaving the network" broadcast. Any Thread devices on the network that have cached data associated with this device, listen to this broadcast and proactively purge their data. BRs can purge SRP services registered for this device and any entries in neighbor/routing tables. The Thread devices can send this broadcast when:
I am a bit reluctant to the solution that involves sending SRP updates on the factory reset as I think the factory reset should include as few steps as possible to increase chances of successful recovery of a device that has entered a faulty state. On the other hand, we already generate Leave events in such a scenario, so maybe that's not a big deal after all... I can't make up my mind :)
Anyway, reducing the lease time sounds like a good idea to me as it should be easy to implement, and it would also help for the case when the device has lost the connectivity with the border router. Note that OpenThread uses a single lease time for the entire SRP update, so we can't set a shorter lease time just for the _matterc._udp
service. It shouldn't be a big problem though as we might do sth like this:
DnssdServer::Instance().StartServer()
is called._matterc._udp
service with CM=0
is added, reduce the lease time to the duration of the extended discovery._matterc._udp
service with CM=1
is added, reduce the lease time to the maximum duration of the commissioning window.I am a bit reluctant to the solution that involves sending SRP updates on the factory reset as I think the factory reset should include as few steps as possible to increase chances of successful recovery of a device that has entered a faulty state. On the other hand, we already generate Leave events in such a scenario,
Ah so a leave event already exists and is sent by Thread device logically exiting a Thread network. That is good to know.
For shorter lease based solution, referencing: https://github.com/project-chip/connectedhomeip/issues/16858#issuecomment-1422109811 , it seems like Thread BR decides what lease time to set. Can the Thread device request/suggest a suitable lease time for a given SRP update?
Hello @bzbarsky-apple , the record has CM=0. See sniffer log attached.
OK, so that's an extended commissioning advertisement. But the original issue report said:
Any Discovery over wifi will provide an IPv6 address to a device that no longer exists. So provisioning attempts will fail.
which should not really be the case. Commissioning discovery should (and does, afaik) ignore CM=0 DNS-SD results, precisely because those do not represent commissionable devices.
What is the expected behavior of the CM flag?
CM=1 means "commissioning window is open". CM=0 means "commissioning window is not open, but I am advertising so you will have the pairing hints to try to open a commissioning window if you decide you want to do that".
Commissioners that are not trying to do discovery of things that are not commissionable but could be should ignore CM=0 results.
Commissionees that are not trying to implement the extended discovery behavior (e.g. not providing useful pairing hint TXT records) should turn off extended discovery altogether.
Commissionees that do want to do extended discovery right now switch from advertising with CM=1 to advertising with CM=0 when the commissioning window closes.
The _udp record is withdrawn after 15m
Yes, when the extended discovery times out...
Here's another idea for a solution.
A solution to what exact problem?
Thanks @bzbarsky-apple , very clear description:
CM=1 means "commissioning window is open". CM=0 means "commissioning window is not open, but I am advertising so you will have the pairing hints to try to open a commissioning window if you decide you want to do that".
is this behavior included in CHIP SDK1.0, so all existing Matter-certified devices should be complaint ?
is this behavior included in CHIP SDK1.0
Yes.
so all existing Matter-certified devices should be complaint ?
I don't know whether we have specific certification tests for this... But we did do some manual developer testing with both minimal mdns and some of the platform mdns backends (definitely the darwin one).
Awesome. Well, I can confirm that Light-bulb example (min-mdns) running on Nordic's-dk (nrf tag: v2.2.0) with default settings is giving me matterc._udp CM=0. My new understanding of the behavior solves my original concern. However there is still an area of improvement in the scenario which light-bulb gets factory reset during 2nd admin flow, and we end up with matterc._udp record CM=1 stale in the border router for 2 hrs. In that case, we need CHIP SDK to implement @Damian-Nordic suggestions.
Yes, "factory reset while commissioning window is open" will absolutely cause problems. If we can do what @Damian-Nordic suggests, that would be very nice.
I agree with @Damian-Nordic 's proposal.
SRP and OpenThread already support the client specifying lease intervals on a given service instance.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Reproduction steps
The _udp record will continue stale in the SRP server for 2 hrs Any Discovery over wifi will provide an IPv6 address to a device that no longer exists. So provisioning attempts will fail.
proposal:
Bug prevalence
always
GitHub hash of the SDK that was being used
561d23d0db215a99705ff0696e73853c8edf11b2
Platform
android
Platform Version(s)
No response
Anything else?
No response