project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.46k stars 1.99k forks source link

[SQA] [Thread]TC-CNET-4.12 Getting Run Command Failure after connect-network with THREAD_2ND_OPERATIONALDATASET #20803

Open Survensa opened 2 years ago

Survensa commented 2 years ago

Problem

Steps 2 - 7: remove th_xpan network, add THREAD_2ND_OPERATIONALDATASET, Connect to THREAD_2ND_OPERATIONALDATASET After Step 7 : ./chip-tool networkcommissioning connect-network hex:1161111128222222 1 0 --Breadcrumb 2 not able to proceed further due to run command failure.

Actual behavior after connect-network with THREAD_2ND_OPERATIONALDATASET

[1657887831.463607][3486:3486] CHIP:TOO: Run command failure: ../../commands/common/CHIPCommand.cpp:423: CHIP Error 0x00000032: Timeout

Steps to reproduce

  1. Remove th_xpan network { ./chip-tool networkcommissioning remove-network hex:1111111152222222 22 0 }
  2. Add THREAD_2ND_OPERATIONALDATASET { ./chip-tool networkcommissioning add-or-update-thread-network hex:0e080000000000010000000300000f35060004001fffe0020811111111282222220708fd5930b7d8b62f58051000112233445562778899aabbccddeeff030e4f70656e54687265746444656d6f010212340410005ed405199188e1a5c0dd607282d7480c0402a0fff8 22 0 (second network dataset value) }
  3. Read the network { ./chip-tool networkcommissioning read networks 22 0 }

System configuration

Log : CNET Thread.txt

bzbarsky-apple commented 2 years ago

After the device connects to the second network, it can't communicate with the chip-tool, which is on the original network, right? So the behavior here seems completely expected... @Survensa

cecille commented 2 years ago

Agree - the omission of the verification step here in the test plan is intentional. chip-tool won't receive a response on its network if the network has been changed.

doublemis1 commented 2 years ago

I can look at that also but, I can see that extended and doesn't match ./chip-tool networkcommissioning connect-network hex:1161111128222222 1 0 --Breadcrumb 2 - extended panid 1161111128222222

./chip-tool networkcommissioning add-or-update-thread-network hex:0e080000000000010000000300000f35060004001fffe0020811111111282222220708fd5930b7d8b62f58051000112233445562778899aabbccddeeff030e4f70656e54687265746444656d6f010212340410005ed405199188e1a5c0dd607282d7480c0402a0fff8 22 0 - extended panid 1111111128222222

woody-apple commented 2 years ago

Cert Blocker Review: It appears that we may need to change the test plan. @cecille can you please file an issue against the test plan and close this one?

Thanks!

Survensa commented 2 years ago

@doublemis1 Checked for extended panid but facing same issue :

  1. ./chip-tool networkcommissioning connect-network hex:1111111107222222 48 0 --Breadcrumb 2
  2. ./chip-tool networkcommissioning add-or-update-thread-network hex:0e080000000000010000000300000f35060004001fffe0020811111111072222220708fd9957a92dadc7aa051000112233445966778899aabbccddeeff030f4f70656e54687265616444656d6f6a0102123404106ad73489ca556d19f8c47fd2150e58e30c0402a0fff8 48 0

Full log : cnet-4.12 error.txt

doublemis1 commented 2 years ago

Hi, I was able to reproduce it. The Border Router 1 and Border Router 2 are within the same network (not Thread Network) and may discover its services by DNS. Thats way after connect DUT to the Border Router 2 network, the DUT informs that SRP services already existis (existing on Border Router 1).

I: 375759 [DL]SRP Client was stopped, because current server is no longer detected.
I: 375767 [DL]SRP Client was started, detected server: fd2f:a875:aab9:3927:a4b1:7250:5381:4135
D: 375776 [DL]OpenThread State Changed (Flags: 0x200002a4)
D: 375819 [DL]   Device Role: CHILD
D: 375822 [DL]   Partition Id: 0xD192944
D: 375831 [DL]OpenThread State Changed (Flags: 0x00000003)
D: 375837 [DL]   Thread Unicast Addresses:
D: 375881 [DL]        fd11:db::275:5101:6228:531d/64 valid preferred
D: 375887 [DL]        fd2f:a875:aab9:3927:0:ff:fe00:4c01/64 valid rloc
D: 375894 [DL]        fd2f:a875:aab9:3927:5a87:1ca3:6d30:7f7e/64 valid
D: 375901 [DL]        fe80::3481:909f:b6e4:e6d6/64 valid preferred
D: 375912 [DMG]ICR moving to [ Preparing]
D: 375916 [DMG]ICR moving to [AddingComm]
D: 375919 [DMG]ICR moving to [AddedComma]
D: 375925 [DMG]Endpoint 0, Cluster 0x0000_0030 update version to 192d259f 
D: 375931 [DMG]Decreasing reference count for CommandHandler, remaining 0 
I: 375940 [IN]Prepared secure message 0x20004ce4 to 0x000000000001B669 (1)  of type 0x9 and protocolId (0, 1) on exchange 12594r with MessageCounter:108659917.
I: 375954 [IN]Sending encrypted msg 0x20004ce4 with MessageCounter:108659917 to 0x000000000001B669 (1) at monotonic time: 000000000005BC92 msec
D: 375969 [DMG]ICR moving to [CommandSen]
D: 375973 [DMG]ICR moving to [AwaitingDe]
D: 375977 [DL]Using Thread extended MAC for hostname.
I: 375983 [DIS]Advertise operational node 529BA477D61319E8-0000000000000001
I: 375992 [SVR]Operational advertising enabled
I: 376146 [EM]Received message of type 0x10 with protocolId (0, 0) and MessageCounter:246502512 on exchange 12594r
D: 376156 [EM]Found matching exchange: 12594r, Delegate: (nil)
D: 376163 [EM]Rxd Ack; Removing MessageCounter:108659917 from Retrans Table on exchange 12594r
D: 376172 [EM]Removed CHIP MessageCounter:108659917 from RetransTable on exchange 12594r
I: 376181 [DL]OpenThread SED interval set to 1000ms
**E: 377284 [DL]SRP update error: domain name or RRset is duplicated**

I was able to pass this test using only docker networks. In this solution, border routers are on separate networks. Equipment:

Creating docker networks:

Run Border routers

*Note web GUI for otbr1 will be avaliable on 0.0.0.0:8080, for otbr2 0.0.0.0:8081

Form networks: Form networks on both Border Routers with different datasets.

Proceed the test.

doublemis1 commented 2 years ago

Currently, the SRP Replication feature is not supported on the current Thread 1.3, hence the border routers from two separate Thread Networks can discover its services within a local network (using backbone interface) but synchronization and replication handling are not implemented.

Survensa commented 2 years ago

@doublemis1 I am able to connect to second network through second OTBR but the issue is I am not able to proceed further, I was forced to restart the whole setup.

doublemis1 commented 2 years ago

What is the issue? Please provide logs and steps how to reproduce

Survensa commented 2 years ago

@doublemis1 Once after the second network is connected, I am no able to execute any command as it end with Timeout error

Attached both chiptool and DUT log for referance :

Survensa commented 2 years ago

Thanks @doublemis1

Followed the steps to get connected and able to run the test case but not able to connect with first network once after disarming the Armfailsafe and ends with Timeout error

Error log :

I: 583601 [TS]Previous Last Known Good Time: 2022-07-29T05:46:32 I: 583607 [TS]Reverted Last Known Good Time to previous value D: 583613 [DMG]Endpoint 0, Cluster 0x0000_0030 update version to 9380461a D: 583621 [ZCL]Failsafe timeout, tell platform driver to revert network credentials. E: 583629 [NP]Found Thread configuration backup: reverting configuration E: 583723 [DL]Long dispatch time: 256 ms, for event type 32782 I: 583729 [FS]Fail-safe cleanly disarmed D: 583732 [DL]OpenThread State Changed (Flags: 0x11002046) D: 583738 [DL] Device Role: DETACHED D: 583741 [DL] Thread Unicast Addresses: D: 583745 [DL] fdd0:6b61:12a4:6da6:38a8:de10:b0ba:7e89/64 valid D: 583751 [DL] fe80::c831:832e:c849:8deb/64 valid preferred D: 583757 [DL] fd11:92::a1b:f16d:d7f3:d2f2/64 valid preferred D: 583766 [DL]OpenThread State Changed (Flags: 0x10034000) D: 583771 [DL] Network Name: OpenThreadDemo D: 583775 [DL] PAN Id: 0x1234 D: 583780 [DL] Extended PAN Id: 0x2111111122222222 D: 583785 [DL] Channel: 11 D: 583788 [DL] Mesh Prefix: fdd0:6b61:12a4:6da6::/64 I: 583796 [DL]CHIPoBLE advertising disabled because device is fully provisioned D: 583803 [DL]OpenThread State Changed (Flags: 0x1100107d) D: 583809 [DL] Device Role: DETACHED D: 583812 [DL] Thread Unicast Addresses: D: 583816 [DL] fdd0:6b61:12a4:6da6:38a8:de10:b0ba:7e89/64 valid D: 583822 [DL] fe80::c831:832e:c849:8deb/64 valid preferred D: 583828 [DL] fd11:92::a1b:f16d:d7f3:d2f2/64 valid preferred I: 584675 [DL]SRP Client was stopped, because current server is no longer detected. I: 584682 [DL]SRP Client was started, detected server: fdd0:6b61:12a4:6da6:3720:f78e:b659:724e D: 584692 [DL]OpenThread State Changed (Flags: 0x200002a4) D: 584734 [DL] Device Role: CHILD D: 584737 [DL] Partition Id: 0x17775288 D: 584746 [DL]OpenThread State Changed (Flags: 0x00000003) D: 584752 [DL] Thread Unicast Addresses: D: 584796 [DL] fd11:13::a8e0:a386:f9a4:fde0/64 valid preferred D: 584804 [DL] fdd0:6b61:12a4:6da6:0:ff:fe00:1003/64 valid rloc D: 584810 [DL] fdd0:6b61:12a4:6da6:38a8:de10:b0ba:7e89/64 valid D: 584817 [DL] fe80::c831:832e:c849:8deb/64 valid preferred D: 585777 [DL]SRP update succeeded

cecille commented 2 years ago

@woody-apple - can you clarify what test plan changes are needed here?

bzbarsky-apple commented 2 years ago

@cecille After a ConnectNetwork, the test plan expects to still receive the response to the original command on the old network that the device is no longer connected to.

doublemis1 commented 2 years ago

@bzbarsky-apple, currently (at least nrfconnect platform) transmits a response to the old network, and right after sending the response DUT, switches to the new network.

cecille commented 2 years ago

Connect network command is sent in steps 7 and 16. Neither one of those steps has any expected outcome specified. I suppose it might have been better to add an explicit note that no response is required, but I don't think it's wrong.

cecille commented 2 years ago

I think the fact that the chip tool command is failing is scaring people, but the test plan doesn't specify that the response needs to come - that's a chip-tool assumption. So I think this can be fixed in the verification steps.

doublemis1 commented 2 years ago

I test it right now, and the device without any issue connects to the second Thread Network and after the arm-fail-safe timeout, the device goes back to the first Thread Network. The issue is with advertised operation services.

= otbr20 IPv6 E5770245A0D6B77C-0000000000000001             _matter._tcp         local
   hostname = [0A82FF282505E2CC.local]
   address = [fd11:ad::a2ae:53a0:7c81:5aac]
   port = [5540]
   txt = ["T=0" "SAI=500" "SII=6000"]
=  otbr0 IPv6 E5770245A0D6B77C-0000000000000001             _matter._tcp         local
   hostname = [0A82FF282505E2CC.local]
   address = [fd11:4b::6414:7609:b69a:b6e0]
   port = [5540]
   txt = ["T=0" "SAI=500" "SII=6000"]

The device doesn't remove the SRP service from the first OTBR (Thread Network) while connecting to the second OTBR. And while the device is going back to the first Thread Network, DUT doesn't remove the SRP service, thus from the controller's perspective, it needs to choose one out of two existing operational services.

The controller wants to connect the device

[1660056811.552199][59852:59857] CHIP:DL: Avahi resolve found
[1660056811.552282][59852:59857] CHIP:DIS: Node ID resolved for E5770245A0D6B77C:0000000000000001
[1660056811.552301][59852:59857] CHIP:DIS:  Hostname: 0A82FF282505E2CC
[1660056811.552321][59852:59857] CHIP:DIS:  IP Address #1: fd11:ad::a2ae:53a0:7c81:5aac
[1660056811.552333][59852:59857] CHIP:DIS:  Port: 5540
[1660056811.552345][59852:59857] CHIP:DIS:  Mrp Interval idle: 6000 ms
[1660056811.552355][59852:59857] CHIP:DIS:  Mrp Interval active: 500 ms
[1660056811.552362][59852:59857] CHIP:DIS:  TCP Supported: 0
[1660056811.553013][59852:59857] CHIP:DIS: Lookup clearing interface for non LL address

While the current device dataset is

uart:~$ ot dataset active
Active Timestamp: 1
Channel: 17
Channel Mask: 0x07fff800
Ext PAN ID: 1111111122222254
Mesh Local Prefix: fd32:1a2f:ba8c:69df::/64
Network Key: 00112233445566778899aabbccddeeff
Network Name: BR_1
PAN ID: 0xabbd
PSKc: e7d33a45227eb3ea460566cac9c96a79
Security Policy: 672 onrc
Done
uart:~$ ot ipaddr
fd11:4b:0:0:6414:7609:b69a:b6e0
fd32:1a2f:ba8c:69df:0:ff:fe00:c001
fd32:1a2f:ba8c:69df:6879:1371:79f1:989a
fe80:0:0:0:882:ff28:2505:e2cc
Done
uart:~$ 
bzbarsky-apple commented 2 years ago

The stale SRP record thing sounds like https://github.com/project-chip/connectedhomeip/issues/21101

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 1 year ago

This stale issue has been automatically closed. Thank you for your contributions.