project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.39k stars 1.98k forks source link

[BUG] [Thread implementation] Should not depend on Thread MLE Discovery / Scan when commissioning #31752

Open smides-nest opened 8 months ago

smides-nest commented 8 months ago

Reproduction steps

This issue was originally reported within Thread Group. Following discussion in the technical team there, I am filing the issue here with the Matter SDK implementation.

It appears that some commissioners issue a Thread scan as part of their commissioning flow and will only provide Thread credentials to the commissionee if the scan returns a (the?) known / "target" Thread PAN. Per Thomas Cuyckens @ Qorvo, the Matter implementation calls the OpenThread API to scan with a parameter set to only transmit 1 MLE Discovery frame. This frame is neither retransmitted immediately (while tuned to the same channel) nor is it retransmitted later (as part of a subsequent scan attempt). In congested environments, or if the device is just unlucky, that MLE Discovery transmission may be clobbered by another user of the 2.4 GHz unlicensed spectrum and, if that occurs, the "target" PAN will not hear it and will, therefore, not send an MLE Discovery Response -- i.e., the "target" PAN will not be returned in the scan results.

Ideally a Thread Scan would not need to be performed as part of commissioning, as doing so causes the commissionee to transmit an MLE Discovery frame on each Thread channel, soliciting responses from all Thread Routers on those channels. This causes added congestion in the shared, unlicensed 2.4 GHz spectrum, albeit only when commissioning a Matter device.

That said, a commissioner that requires the Thread PAN be returned as part of a commissionee-initiated scan should probably re-attempt the scan if the results are not to its liking. Either that or the Matter logic calling the OpenThread scan primitive should call it with the parameter set to issue > 1 MLE Discovery.

Bug prevalence

Exhibited more frequently in congested RF environs

GitHub hash of the SDK that was being used

N/A

Platform

other

Platform Version(s)

No response

Anything else?

No response

bzbarsky-apple commented 8 months ago

@Damian-Nordic

EskoDijk commented 8 months ago

As mentioned in the description (just to clarify this): the "scan" is a transmitted multicast MLE message "Discovery Request".

If the Thread MLE discovery request needs to be sent, then it should ideally be initiated with a parameter specifying the Extended PAN ID of the Thread Network to look for. In case of multiple networks, this already potentially reduces the volume of responses considerably.

Thread supports this MLE message with zero or more Extended PAN ID TLVs as parameter, to filter on these XPAN IDs only - a response is given if at least one of these TLVs match the recipient's XPAN ID.

If the parameter is set to send multiple MLE Discovery requests, also a parameter would be needed to define the waiting time in between the transmissions. As this is up to the higher layer application using Thread, in principle. (There may be some requirements for minimum wait time in the Thread spec - I didn't check that.)

alan-eero commented 7 months ago

Our commissioners use the Matter command "scanNetwork" to trigger a MLE Discovery to confirm the commissionee has line of sight to the commissioner target thread network. The commissioner does not retry scan Network. It does it only once for each commissioning attempt. The settings follow Thread spec. Max scan duration is 300ms (in each channel), Discovery Jitter is 250ms (delayed response max). It's only 1 broadcast per channel. Each Router will answer. We are aware that in congestion scenarios, MLE Discovery Response is a "best effort" from the available devices. If the target Thread networks is not included in the ScanNetworks results, our Matter commissioning uses other mechanism to continue with the commissioning. example 1. If the commissioner is also a TBR, and we have confirmed BLE connection with the commissione, then the flow continues with the commissioner/TBR nwk credentials. example 2. We ask the user to type the Thread Nwk credentials. etc. If you are have found failing scenarios, don't hesitate to contact our support team.

jwhui commented 7 months ago

If the target Thread networks is not included in the ScanNetworks results

@alan-eero , Thread supports targeted network scan, which would seem to benefit your use case. Additionally, MLE Discovery can be repeated multiple times to improve reliability. It may be that Matter requires an enhancement (spec and/or SDK) to expose these capabilities.

EskoDijk commented 7 months ago

Adding some detail to what @jwhui suggested:

The Matter "scanNetworks" command has a clear API issue, for this use case. It supports both non-directed scanning ('all networks') and directed scanning (specific network identifier): and for this use case we need the latter.

However, the only parameter that can be provided for directed scanning is the "SSID" parameter which (in 1.0 core spec) only is used in Wi-Fi networks, not Thread. For Thread, ideally the following parameters would be used:

The rationale for this is that if a party (like Commissioner) already knows the Extended PAN ID, it's highly likely that it will also know other network information like the current channel. By supplying both pieces of information the scan becomes much faster and much more reliable as well (because 'scanNetworks' could do some repeated attempts easily if it's only searching on a single channel).

In addition, the Thread 8-byte Extended PAN ID could easily fit in the "SSID" parameter field (maybe that's already done today - I don't know) since it is a octet string already. But it wasn't said yet in the Matter 1.0 spec at least.