project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.48k stars 2.01k forks source link

[CERT-TEST-FAILURE]Back-to-back runs of TC-CCTRL-2.2, MCORE_FS_1_1, and MCORE_FS_1_2 fail due to TH_SERVER termination issue across Fabric Synchronization and Commissioner Control clusters #35252

Closed Rajashreekalmane closed 2 months ago

Rajashreekalmane commented 2 months ago

Feature Area

Other

Test Case

TC-CCTRL-2.2, TC-MCORE_FS_1_1 and TC-MCORE_FS_1_2

Reproduction steps

Issue Summary: Back-to-back runs of the test cases TC-CCTRL-2.2, MCORE_FS_1_1, and MCORE_FS_1_2 fail because the TH does not properly terminate the TH_SERVER application. This issue affects both the Fabric Synchronization and Commissioner Control clusters, leading to timeout errors during secure session establishment in subsequent test executions.

Steps to Reproduce:

sudo rm -rf /tmp/chip_* && ./fabric-admin ./fabric-bridge-app

Execute the following command:

python3 TC_CCTRL_2_2.py --commissioning-method on-network --discriminator 3840 --passcode 20202021 --paa-trust-store-path ../../credentials/development/paa-root-certs/ --storage-path admin_storage.json --string-arg th_server_app_path:/home/ubuntu/Aug7/connectedhomeip/examples/all-clusters-app/linux/out/all-clusters-app/chip-all-clusters-app Note: The reference app path used is /home/ubuntu/Aug7/connectedhomeip/examples/all-clusters-app/linux/out/all-clusters-app/chip-all-clusters-app. After successfully validating TC_CCTRL_2_2, attempt to validate TC_MCORE_FS_1_1:

python3 TC_MCORE_FS_1_1.py --commissioning-method on-network --discriminator 3840 --passcode 20202021 --paa-trust-store-path ../../credentials/development/paa-root-certs/ --storage-path admin_storage.json --string-arg th_server_app_path:/home/ubuntu/Aug7/connectedhomeip/examples/all-clusters-app/linux/out/all-clusters-app/chip-all-clusters-app

Bug prevalence

Everytime

GitHub hash of the SDK that was being used

9c2d570f7852438c832622a8c1b6ba395ffb1711

Platform

raspi

Anything else?

Expected Behavior: The TH_SERVER application should terminate properly after the completion of each test case, allowing subsequent runs of TC-CCTRL-2.2, MCORE_FS_1_1, and MCORE_FS_1_2 without issues.

Actual Behavior: The TH_SERVER fails to terminate, causing the following error during subsequent test runs:

[1724839531.972781][2884:2884] CHIP:DL: bleAdv Timeout : Start slow advertisement [1724839531.973243][2884:2884] CHIP:DL: SET service data to {'0xFFF6': <[byte 0x00, 0x27, 0x03, 0xf1, 0xff, 0x01, 0x80, 0x00]>} [MatterTest] 08-28 10:05:35.035 ERROR Mdns discovery timed out [MatterTest] 08-28 10:05:35.037 WARNING Failed to establish secure session to device: src/controller/python/ChipDeviceController-ScriptPairingDeviceDiscoveryDelegate.h:59: CHIP Error 0x00000032: Timeout [MatterTest] 08-28 10:05:35.041 ERROR Error in TC_MCORE_FS_1_1#setup_class. Traceback (most recent call last): File "/home/ubuntu/Aug7/connectedhomeip/no/lib/python3.12/site-packages/mobly/base_test.py", line 428, in _setup_class self.setup_class() File "/home/ubuntu/Aug7/connectedhomeip/src/python_testing/matter_testing_support.py", line 1819, in async_runner return _async_runner(body, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/Aug7/connectedhomeip/src/python_testing/matter_testing_support.py", line 1807, in _async_runner return asyncio.run(runner_with_timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/tasks.py", line 520, in wait_for return await fut ^^^^^^^^^ File "/home/ubuntu/Aug7/connectedhomeip/src/python_testing/TC_MCORE_FS_1_1.py", line 63, in setup_class await self.TH_server_controller.CommissionOnNetwork(nodeId=self.server_nodeid, setupPinCode=passcode, filterType=ChipDeviceCtrl.DiscoveryFilterType.LONG_DISCRIMINATOR, filter=discriminator) File "/home/ubuntu/Aug7/connectedhomeip/no/lib/python3.12/site-packages/chip/ChipDeviceCtrl.py", line 2164, in CommissionOnNetwork return await asyncio.futures.wrap_future(ctx.future) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ chip.exceptions.ChipStackError: src/controller/python/ChipDeviceController-ScriptPairingDeviceDiscoveryDelegate.h:59: CHIP Error 0x00000032: Timeout [MatterTest] 08-28 10:05:35.068 INFO ***** Test Failure : [MatterTest] 08-28 10:05:35.068 INFO Finished test in 0ms [MatterTest] 08-28 10:05:35.069 ERROR


*

  • Test setup_class failed for the following reason:
  • src/controller/python/ChipDeviceController-ScriptPairingDeviceDiscoveryDelegate.h:59: CHIP Error 0x00000032: Timeout
  • File "/home/ubuntu/Aug7/connectedhomeip/no/lib/python3.12/site-packages/chip/ChipDeviceCtrl.py", line 2164, in CommissionOnNetwork
  • chip.exceptions.ChipStackError: src/controller/python/ChipDeviceController-ScriptPairingDeviceDiscoveryDelegate.h:59: CHIP Error 0x00000032: Timeout

PFA log below : Fabric admin.txt Fabric-bridge-app.txt python script validation log.txt

tehampson commented 2 months ago

Only TC-CCTRL-2.2 and, TC-MCORE_FS_1_1 have issues with running back to back.

The issue is they leave the shell TH_SERVER application running because they set the shell=True.

TC-MCORE_FS_1_2 doesn't have issues running back-to-back, but is has issues running after TC-CCTRL-2.2 and, TC-MCORE_FS_1_1 as the port number used by the test (5543) is occupied by the application that never properly terminated

Rajashreekalmane commented 2 months ago

I've noticed that after rebooting the RPI, the TCs pass successfully. However, if I run the TCs back-to-back, even the same TC that initially passed will fail.

tehampson commented 2 months ago

Todays live debug sesssion between Rajashreekalmane and myself determined that the issue of running back to back is only for

We noticed some flake with

This flake is not something I can look into before SVE. But I can fix the back to back issues as it requires a similar fix as https://github.com/project-chip/connectedhomeip/pull/35257