project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.33k stars 1.97k forks source link

[BUG] Long latency of commissioning parameter advertisement #25363

Open kpark-apple opened 1 year ago

kpark-apple commented 1 year ago

Reproduction steps

  1. Commission a Thread device
  2. Open commissioning window

SRP update is expected immediately after opening commissioning window but there was a delay of over 30 seconds.

In the attached log, commissioning window was open at the following:

I: 24481 [ZCL]Commissioning window is now open

But the SRP update started at the following:

I: 58033 [DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3372/13 cm=2
I: 58042 [DL]advertising srp service: 0324EB36788498E2._matterc._udp

Nordic-pairing-failed-B620-resident-odeon-config.txt

Bug prevalence

Observed just once

GitHub hash of the SDK that was being used

df4ad9521b1dac98d7965f455fec74ffa270429c

Platform

nrf

Platform Version(s)

No response

Anything else?

No response

Damian-Nordic commented 1 year ago

These SRP logs are not related to opening the commissioning window. Just before the commissioning window was opened, the following log messages were printed, which indicates the SRP services were updated correctly (also, there was no "SRP update error:" message after that so the SRP server most likely returned a success).

I: 24446 [DIS]Updating services using commissioning mode 2
I: 24451 [DIS]Advertise operational node 2BB6B3334D89CF46-000000007716EC6F
I: 24459 [DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3372/13 cm=2
I: 24469 [DL]advertising srp service: 0324EB36788498E2._matterc._udp
I: 24475 [DL]removing srp service: E0A488F0DB5A3B0D._matterc._udp
I: 24481 [ZCL]Commissioning window is now open

The SRP logs that you pasted were printed because then RemoveFabric command was received so DNS-SD services must have been updated.

What is exactly the issue?

kpark-apple commented 1 year ago

Let me double check. I only found SRP update happened at the later timeframe from the border router.

kpark-apple commented 1 year ago

The SRP update for _L3372 was received by SRP server more like 5 seconds after timestamp 58033. No SRP update for _L3372 or for instance 0324EB36788498E2 was ever received on the SRP server side prior to that. SRP update for instance E0A488F0DB5A3B0D was the last one and the gap between the update for E0A488F0DB5A3B0D and update for 0324EB36788498E2 was about 35 seconds.

kpark-apple commented 1 year ago

If SRP client doesn't receive SRP update response, how long would it take before it prints timeout message?

Damian-Nordic commented 1 year ago

What is your network topology? Can it be very noisy environement etc? Would it be possible that you capture the traffic using a 15.4 sniffer that would allow us to understand where the problem is? I can't see anything wrong in the logs on the device side at the time you refer to, but in one of the previous runs the following error occurred:

E: 30823 [DL]SRP update error: timed out waiting on server response

That may indicate some connectivity issues.

If I understand the code correctly, the SRP update timeout changes between 1.8s and 1 hour (an exponential backoff is used) so it is hard to tell exactly, but it is a bit weird that the above error occurred only once...

kpark-apple commented 1 year ago

If our tester reproduces the same issue with a sniffer log, I'll update this issue.

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.