project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.43k stars 1.99k forks source link

[Pairing Reliability] #4 Failure: Commissioning failed with CHIP Error 0x000000C9: No shared trusted root #33998

Closed Saravana-kr22 closed 3 months ago

Saravana-kr22 commented 4 months ago

Description: I have used python controller to run a stress test on my RPI sample dut (all-clusters-app) on Pair-Unpair test scenario for 10,000 iterations. The test passed in iteration 2608 without any error but the iteration 2609 is failed to commission the dut during the case session with the below traceback error

Traceback (most recent call last): File "/Users/chaitanyajandhyala/code/python/matter-qa/src/matter_qa/library/helper_libs/matter_testing_support.py", line 1177, in pair_dut raise CommissionFailure(f"Pairing with DUT failed Reason {commission_device_result.str()} ") chip.commissioning.CommissionFailure: CommissionFailure(Pairing with DUT failed Reason src/protocols/secure_channel/CASESession.cpp:1984: CHIP Error 0x000000C9: No shared trusted root )

GitHub hash of the SDK that was being used: https://github.com/project-chip/connectedhomeip/commit/3718e996b56f36a553c1b4571f77728396590776

Platform controller - python controller running on raspi, DUT - all-cluster-app running on raspi

Provisioned using on-network

Platform Version(s) RPI-4, 8GB RAM

Test Scenario

Pair/Provision the DUT (All-cluster-app running on raspi) using on-network Read device information after the device is provisioned sleep for 2 secs Unpair the DUT (i.e. removing the fabric) factory reset the DUT repeat step1 - 6 for 10,000 iterations Script reference: https://github.com/CHIP-Specifications/matter-qa/blob/main/src/matter_qa/scripts/reliability_scripts/TC_Pair.py

Reference log: Example of Passed iteration details:

The iteration number 2608 is passed, below log files for the same TH log: TH_log_2024-04-26_04-31-38.log DUT log: Dut_log_2608_2024-04-26T04_31_47_630533.log

Example of Failed iteration details:

The iteration number 2609 is failed with this error below the logfile TH log : TH_log.log DUT log: Dut_log_2609_2024-04-26T04_37_18_972898.log32.log)

For the detailed summary please reference this json: summary.json

bzbarsky-apple commented 3 months ago

From the TH log, while it was doing the operational resolve:

2024-04-26 04:32:00,120 - chip.native.DIS - INFO - __init__.py:_RedirectToPythonLogging:40 - UDP:[2600:1700:5390:be60:dea6:32ff:feab:25e4%en0]:5540: new best score: 6
2024-04-26 04:32:00,121 - chip.native.ATM - INFO - __init__.py:_RedirectToPythonLogging:40 - {
2024-04-26 04:37:18,718 - chip.native.ATM - INFO - __init__.py:_RedirectToPythonLogging:40 -    "compressed_fabric_id" : 12596051433602548728,
...
2024-04-26 04:37:18,725 - chip.native.DIS - INFO - __init__.py:_RedirectToPythonLogging:40 - Checking node lookup status after 319582 ms

There's that 5-minute losing of timeslice, just like in https://github.com/project-chip/connectedhomeip/issues/33993

While we sat there for 5 minutes, the DUT failsafe expired, and when we sent the Sigma1 the DUT responded with "I have no idea what fabric this is, because I've deleted it already".

bzbarsky-apple commented 3 months ago

Oh, and the above issue says:

controller - python controller running on raspi,

but the TH log clearly says:

2024-04-26 04:31:58,559 - chip.native.DIS - ERROR - __init__.py:_RedirectToPythonLogging:38 - Mdns: Browse failure (src/platform/Darwin/DnssdImpl.cpp:412: CHIP Error 0x00000074: The operation has been cancelled)

so that would be a very odd raspi.

bzbarsky-apple commented 3 months ago

Duplicate of https://github.com/project-chip/connectedhomeip/issues/33993