Open angelus-liviu opened 2 weeks ago
I cannot confirm this. For me the test has virtually no waiting time, except maybe the 5 seconds or so when showing Parsing SystemCommands...
What might be is that the whole commissioning stuff does happen without any display in the backend log or the TH UI. So from the User point of view it looks like everything is stalling but in reality the backend communicates quite much with the DUT.
Please check the log of the DUT as well to see if there really is idle time involved on both sides.
[1727367517.982296][5882:5882] CHIP:SVR: Commissioning completed successfully
[1727367517.982349][5882:5882] CHIP:DIS: Updating services using commissioning mode 0
[1727367517.992885][5882:5882] CHIP:DIS: CHIP minimal mDNS started advertising.
[1727367518.033533][5882:5882] CHIP:DL: Using WiFi MAC for hostname
[1727367518.033591][5882:5882] CHIP:DIS: Advertise operational node 2139E3573089B50B-0000000012344321
[1727367518.033642][5882:5882] CHIP:DIS: Responding with _matter._tcp.local
[1727367518.033668][5882:5882] CHIP:DIS: Responding with 2139E3573089B50B-0000000012344321._matter._tcp.local
[1727367518.033692][5882:5882] CHIP:DIS: Responding with 2139E3573089B50B-0000000012344321._matter._tcp.local
[1727367518.033708][5882:5882] CHIP:DIS: Responding with D83ADD383AE20000.local
[1727367518.033723][5882:5882] CHIP:DIS: Responding with D83ADD383AE20000.local
[1727367518.033745][5882:5882] CHIP:DIS: Responding with _I2139E3573089B50B._sub._matter._tcp.local
[1727367518.033761][5882:5882] CHIP:DIS: CHIP minimal mDNS configured as 'Operational device'; instance name: 2139E3573089B50B-0000000012344321.
[1727367518.053793][5882:5882] CHIP:DIS: mDNS service published: _matter._tcp
[1727367518.053881][5882:5882] CHIP:IN: Expiring all PASE sessions
[1727367518.053902][5882:5882] CHIP:IN: SecureSession[0xaaaafb30df40]: MarkForEviction Type:1 LSID:7508
[1727367518.053922][5882:5882] CHIP:SC: SecureSession[0xaaaafb30df40, LSID:7508]: State change 'kActive' --> 'kPendingEviction'
[1727367518.053947][5882:5882] CHIP:IN: SecureSession[0xaaaafb30df40]: Released - Type:1 LSID:7508
[1727367518.054180][5882:5882] CHIP:EM: >>> [E:9565r S:0 M:157971648 (Ack:202372015)] (U) Msg RX from 0:AD43A96D275D1FB4 [0000] --- Type 0000:10 (SecureChannel:StandaloneAck) (B:26)
[1727367518.054377][5882:5882] CHIP:EM: Found matching exchange: 9565r, Delegate: (nil)
[1727367518.054438][5882:5882] CHIP:EM: Rxd Ack; Removing MessageCounter:202372015 from Retrans Table on exchange 9565r
[1727367518.055265][5882:5882] CHIP:EM: >>> [E:9566r S:7509 M:188517023 (Ack:258400641)] (S) Msg RX from 1:000000000001B669 [B50B] --- Type 0000:10 (SecureChannel:StandaloneAck) (B:34)
[1727367518.055307][5882:5882] CHIP:EM: Found matching exchange: 9566r, Delegate: (nil)
[1727367518.055337][5882:5882] CHIP:EM: Rxd Ack; Removing MessageCounter:258400641 from Retrans Table on exchange 9566r
[1727367574.826802][5882:5882] CHIP:EM: >>> [E:22679r S:0 M:35984329] (U) Msg RX from 0:A90E2895DB5E3F7C [0000] --- Type 0000:30 (SecureChannel:CASE_Sigma1) (B:196)
[1727367574.826880][5882:5882] CHIP:EM: Handling via exchange: 22679r, Delegate: 0xaaaad79c5048
[1727367574.826928][5882:5882] CHIP:IN: CASE Server received Sigma1 message . Starting handshake. EC 0xaaaafb30a940
[1727367574.826985][5882:5882] CHIP:SC: Received Sigma1 msg
[1727367574.827046][5882:5882] CHIP:SC: Found MRP parameters in the message
[1727367574.827086][5882:5882] CHIP:SC: Peer assigned session key ID 46897
[1727367574.827741][5882:5882] CHIP:SC: CASE matched destination ID: fabricIndex 1, NodeID 0x0000000012344321
there is break of 56 secondes where nothing happens. The commissioning is complete.
The same break is also present in the TH log
INFO | 2024-09-26 16:18:38.691746 | Executing Test Step: Start Python test
INFO | 2024-09-26 16:18:38.692760 | Running Python Test: TC_ACE_1_3
INFO | 2024-09-26 16:19:25.985052 | Sending command to SDK container: python3 /root/python_testing/scripts/sdk/test_harness_client.py sdk/TC_ACE_1_3 TC_ACE_1_3 --tests test_TC_ACE_1_3 --trace-to json:log --discriminator 3840 --passcode 20202021
INFO | 2024-09-26 16:19:33.946057 | Test Step Completed [PASSED]: Start Python test
Please have a look at the logs here.
TH log: th.log
DUT log: dut.log
(please note that the times in the TH log are in GMT+0, the times in the DUT log are in GMT+2. So there is a time difference of 2 hours between them)
The only gaps I see in the TH log are
08:16:32.569 ... 08:16:48.603 (16 seconds): here the commissioning of the DUT happens (10:16:33.500 ... 10:16:48.875 in DUT log)
PYTHON_TEST | 2024-09-27 08:16:32.569 | test_collections.matter.sdk_tests.support.python_testing.models.utils:handle_logs:97 | Didn't find any CertificateAuthorities in storage -- creating a new CertificateAuthority + FabricAdmin... PYTHON_TEST | 2024-09-27 08:16:48.603 | test_collections.matter.sdk_tests.support.python_testing.models.utils:handle_logs:97 | Shutting down the stack... PYTHON_TEST | 2024-09-27 08:16:48.611 | test_collections.matter.sdk_tests.support.python_testing.models.utils:handle_logs:97 | INFO:root:Final result: PASS !
08:16:50.459 ... 08:16:55.741 (5 seconds): here the parsing of the XML files happen
INFO | 2024-09-27 08:16:50.123 | matter_idl.zapxml:ParseXmls:115 | Parsing /app/test_collections/matter/sdk_tests/sdk_checkout/sdk_runner/specifications/chip/global-structs.xml...
...
INFO | 2024-09-27 08:16:50.459 | matter_idl.zapxml:ParseXmls:115 | Parsing SystemCommands... INFO | 2024-09-27 08:16:55.741 | test_collections.matter.sdk_tests.support.sdk_container:send_command:207 | Sending command to SDK container: python3 /root/python_testing/scripts/sdk/test_harness_client.py sdk/TC_ACE_1_3 TC_ACE_1_3 --tests test_TC_ACE_1_3 --trace-to json:log --thread-dataset-hex 000300000f02081111111122222222051000112233445566778899aabbccddeeff01021234 --endpoint 0 --nodeId 10 --manual-code 20391736404 --int-arg PIXIT_ENDPOINT:0 --bool-arg allow_provisional:true --PICS /var/tmp/pics
I guess the difference between what you see and what I see is the fact that I am using the TH running on an Mac with an Ubuntu VM. So my machine is pretty fast whereas the RPi can face slowdowns when lots of Flash access happens. And this probably happens in my second gap.
Maybe someone should look into the performance of these steps on a RPi because I do agree that a delay over 5 seconds for each test case is not ideal.
I'm taking a look into this issue for a couple of days. I wasn't able to reproduce on my environment at first, but I know why now.
Since I was using the development container for the backend (on the Rasp4), the long delay seems not to be happening for some reason. The wait time was only for 12 seconds, and that's far from the 56 seconds wait pointed out on the logs above. When I switched back to the "regular" mode instead of development, I finally managed to reproduce the problems 100%.
I'll continue to take a look on why that's happening and maybe this difference between "dev" and "regular" modes will help find the a source of this.
Regular Mode: log-all-clusters-regular-mode.txt th_regular_mode.log (added some prints to the log to verify the delay point)
Dev Mode: th_dev_mode.log log-all-clusters-dev-mode.txt (added some prints to the log to verify the delay point)
Describe the bug
During Matter1.4 SVE testing we discovered that the waiting time for UI Python Tests Script is too long, nothing is happening in the backed, we checked the logs. One suggestion would be to have an argument that can be configured related to waiting time.
Steps to reproduce the behavior
Run UI Python Tests Script, and check the time, it will take around 2 minutes until the test starts.
Expected behavior
Test to start more quickly for improving test efficiency
Log files
No response
PICS file
No response
Screenshots
Environment
TH v2.11-beta3.1+fall2024 sha: 9c1ef58
Additional Information
No response