project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.48k stars 2k forks source link

[1.2] If SRP registration for any service fails, it may never be readvertised until power cycle or SRP server change #32727

Open ndyck14 opened 7 months ago

ndyck14 commented 7 months ago

Reproduction steps

Steps originally reported in https://github.com/openthread/openthread/issues/9909

Code snippet from repo:

CHIP_ERROR DnssdServer::AdvertiseOperational()
{
    VerifyOrDie(mFabricTable != nullptr);

    for (const FabricInfo & fabricInfo : *mFabricTable)
    {
...
// Should we keep trying to advertise the other operational
        // identities on failure?
        ReturnErrorOnFailure(mdnsAdvertiser.Advertise(advertiseParameters));

If Advertise(..) fails, the service in question will not be registered until the node restarts, or the SRP server restarts (change in reachability).

We've otherwise worked around this, so unclear if others will run into it.

Bug prevalence

rarely

GitHub hash of the SDK that was being used

1.2

Platform

efr32

Platform Version(s)

No response

Type

Platform Issue

Anything else?

No response

Damian-Nordic commented 7 months ago

But what's the reason that Advertise fails other than SRP server is not available in which case the service refresh will be triggered when the SRP server shows up? Could you elaborate what exactly is wrong with the current behavior and what's the proposal?

ndyck14 commented 7 months ago

in our scenario, unfortunately we were never able to observe it directly on a device with logging. But the observation is consistent with the code: the mdnsAdvertiser.Advertise(advertiseParameters) call fails for some reason (for some, but not all services), and when it does, it has no recovery mechanism.

Its possible that our scenario is not valid (we've refactored away from how we originally added our own SRP registration). But the code is written as "we assume this call will never fail", which may be a problem in the future.