project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.49k stars 2.01k forks source link

Autocommissioner ignores ReadCommissioningInfo failure #17943

Open msandstedt opened 2 years ago

msandstedt commented 2 years ago

Problem

At this rev:

commit 4967c0e0a51e3e0f0216c551d35b4cbde5aea7e4
Author: Yufeng Wang <yufengwang@google.com>
Date:   Fri Apr 15 10:24:06 2022 -0700

    [FanControl cluster] Add missing definitions for Fan Control cluster (#17416)

And with these platforms:

Observed while trying to perform multi-admin commissioning.

From the second commissioner:

[1651417385857] [24740:5813070] CHIP: [CTL] Performing next commissioning step 'ReadCommissioningInfo'
[1651417385857] [24740:5813070] CHIP: [CTL] Sending request for commissioning information
[1651417385857] [24740:5813070] CHIP: [DMG] SendReadRequest ReadClient[0x7f88174b3470]: Sending Read Request
[1651417385858] [24740:5813070] CHIP: [IN] Prepared secure message 0x7f88178e3c78 to 0xFFFFFFFB00000000 (0)  of type 0x2 and protocolId (0, 1) on exchange 56879i with MessageCounter:11184038.
[1651417385858] [24740:5813070] CHIP: [IN] Sending encrypted msg 0x7f88178e3c78 with MessageCounter:11184038 to 0xFFFFFFFB00000000 (0) at monotonic time: 0000000012662124 msec
[1651417385858] [24740:5813070] CHIP: [DMG] MoveToState ReadClient[0x7f88174b3470]: Moving to [AwaitingIn]
[1651417385858] [24740:5813070] CHIP: [EM] Sending Standalone Ack for MessageCounter:982426612 on exchange 56878i
[1651417385858] [24740:5813070] CHIP: [IN] Prepared unauthenticated message 0x700001c09478 to 0x0000000000000000 (0)  of type 0x10 and protocolId (0, 0) on exchange 56878i with MessageCounter:4249923951.
[1651417385858] [24740:5813070] CHIP: [IN] Sending unauthenticated msg 0x700001c09478 with MessageCounter:4249923951 to 0x0000000000000000 at monotonic time: 0000000012662125 msec
[1651417385858] [24740:5813070] CHIP: [EM] Flushed pending ack for MessageCounter:982426612 on exchange 56878i
[1651417386009] [24740:5813070] CHIP: [EM] Received message of type 0x1 with protocolId (0, 1) and MessageCounter:639584 on exchange 56879i
[1651417386009] [24740:5813070] CHIP: [EM] Found matching exchange: 56879i, Delegate: 0x7f88174b3470
[1651417386009] [24740:5813070] CHIP: [EM] Rxd Ack; Removing MessageCounter:11184038 from Retrans Table on exchange 56879i
[1651417386009] [24740:5813070] CHIP: [EM] Removed CHIP MessageCounter:11184038 from RetransTable on exchange 56879i
[1651417386010] [24740:5813070] CHIP: [DMG] StatusResponseMessage =
[1651417386010] [24740:5813070] CHIP: [DMG] {
[1651417386010] [24740:5813070] CHIP: [DMG]     Status = 0x01 (FAILURE),
[1651417386010] [24740:5813070] CHIP: [DMG]     InteractionModelRevision = 1
[1651417386010] [24740:5813070] CHIP: [DMG] }
[1651417386010] [24740:5813070] CHIP: [IM] Received status response, status is 0x01 (FAILURE)
[1651417386010] [24740:5813070] CHIP: [DMG] mResubscribePolicy is null
[1651417386010] [24740:5813070] CHIP: [CTL] Successfully finished commissioning step 'ReadCommissioningInfo'

The interaction failed, but the commissioner ignores the failure. The failure is caused by this on commissionee side:

 E (2704323) chip[IM]: AttributePath pool full

It appears that the scatter/gather from the commissioner is no longer viable because too few path segments are available in the commissionee. In this scenario, the first-ecosystem admin had only one single subscription path allocated. The second ecosystem commissioner attempted to consume 8 paths in a single read and the target could not support this with the code at that revision.

This leads to this spurious Attestation failure in the commissioner:

[1651417386657] [24740:5813071] CHIP: [CTL] Performing next commissioning step 'AttestationVerification'
[1651417386657] [24740:5813071] CHIP: [CTL] Verifying attestation
[1651417386661] [24740:5813071] CHIP: [CTL] Failed in verifying 'Attestation Information' command received from the device: err 604. Look at AttestationVerificationResult enum to understand the errors
[1651417386661] [24740:5813071] CHIP: [CTL] Error on commissioning step 'AttestationVerification': '../src/controller/CHIPDeviceController.cpp:962: CHIP Error 0x000000AC: Internal error'
[1651417386661] [24740:5813071] CHIP: [CTL] Failed to perform commissioning step 8

Proposed Solution

Note that at this rev, targets should be able to support the number of read paths the commissioner is attempting to consume:

Note that we have much more paths available at this rev:

commit 077e44eadf46f349916b3f4f358da6126be2ed48
Author: Song GUO <songguo@google.com>
Date:   Mon Apr 25 10:16:30 2022 +0800

    [IM] Update resource minimas for IM as per spec (#16987 u

So the only problem is that the read failure is ignored.

bzbarsky-apple commented 2 years ago

@tehampson @cecille It doesn't look like #17778 addresses this, right?

In particular, DeviceCommissioner needs to implement more ClusterStateCache::Callback methods so it will be notified of failures, instead of just implementing OnDone, as far as I can see...

@mrjerryjohns

tehampson commented 2 years ago

Just providing initial comments, I will take a closer look later today

I don't think that https://github.com/project-chip/connectedhomeip/pull/17778 addresses this issue since it seems to be an error in commissioning step 'ReadCommissioningInfo' which is not something that is addressed in that PR.

I need to look into this a little more, but maybe we can use a similar mechanism. But also as I see inside DeviceCommissioner::OnDone() we are not getting the error log Error parsing commissioning information, so it could just be making sure that return_err is actually set to an error.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.