project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.56k stars 2.04k forks source link

[1.3] Can't commission device: CHIP Error 0x00000032: Timeout #28883

Open Peugot206 opened 1 year ago

Peugot206 commented 1 year ago

Reproduction steps

Hello everyone,

I have the following setup:

I have the devices succesfully connected in a thread network, I can check this by the HEX datasets matching and the state 'leader' and 'child' (or 'router'). When starting commission, I make sure ble adv is enabled on the EFR32 Matter device using a command (ble adv start).

Here is the following command I run on CHIP-tool where the Timeout error occurs: ./chip-tool pairing ble-thread 1301 hex:[LONGHEX] 20202021 3840

I have only once succeeded in this exact setup to commission a device. I really have no clue what I could have done different that time, I do not understand why it keeps failing. Once I send the command I can see in the Matter logs of the EFR32 device that it is receiving stuff. See attached logs below.

I must admit I'm not sure whether I am using 1.1 or 1.3. I am using the latest CHIP-tool version from a few days ago, for the EFR32 I am running the Light Example from the Silicon Labs Matter SDK.

(Furthermore (maybe this should be in a seperate issue) I can't commission the device through my Google Home app. I can see in the EFR32 logs that it is making a connection and the Google Home app can actually find the device, but it fails to commission it. Can't the Google Home app work with OTBR on a Raspberry Pi 4? My phone and the OTBR are connected to the same network).

Bug prevalence

All the time (commission did work only ONCE while using this exact same setup).

GitHub hash of the SDK that was being used

a40963d7d571c18f4da18205b918e0c1b4e788f5

Platform

efr32, nrf, raspi

Platform Version(s)

No response

Type

Platform Issue

Anything else?

Logs OTBR (CHIP-tool, RPI 4)

pi@nomatterwhat:~/CHIP_TOOL_BUILD$ ./chip-tool pairing ble-thread 1301 hex:[LONGHEX] 20202021 3840
[1692947111.491219][1770:1770] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_kvs
[1692947111.496655][1770:1770] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_factory.ini
[1692947111.496909][1770:1770] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_config.ini
[1692947111.497050][1770:1770] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_counters.ini
[1692947111.497505][1770:1770] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-WwWv4B)
[1692947111.498190][1770:1770] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)
[1692947111.498253][1770:1770] CHIP:DL: NVS set: chip-counters/reboot-count = 4 (0x4)
[1692947111.499322][1770:1770] CHIP:DL: Failed to get Ethernet interface
[1692947111.500108][1770:1770] CHIP:DL: Failed to reset Ethernet statistic counts
[1692947111.500464][1770:1770] CHIP:DL: Got WiFi interface: wlan0
[1692947111.502321][1770:1770] CHIP:DL: Found the primary WiFi interface:wlan0
[1692947111.502388][1770:1770] CHIP:IN: UDP::Init bind&listen port=0
[1692947111.502511][1770:1770] CHIP:IN: UDP::Init bound to port=57783
[1692947111.502534][1770:1770] CHIP:IN: UDP::Init bind&listen port=0
[1692947111.502610][1770:1770] CHIP:IN: UDP::Init bound to port=37367
[1692947111.502629][1770:1770] CHIP:IN: BLEBase::Init - setting/overriding transport
[1692947111.502646][1770:1770] CHIP:IN: TransportMgr initialized
[1692947111.502675][1770:1770] CHIP:FP: Initializing FabricTable from persistent storage
[1692947111.502861][1770:1770] CHIP:TS: Last Known Good Time: 2023-08-17T11:41:18
[1692947111.504573][1770:1770] CHIP:FP: Fabric index 0x1 was retrieved from storage. Compressed FabricId 0x248092C7195391ED, FabricId 0x0000000000000001, NodeId 0x000000000001B669, VendorId 0xFFF1
[1692947111.506970][1770:1770] CHIP:ZCL: Using ZAP configuration...
[1692947111.510247][1770:1770] CHIP:CTL: System State Initialized...
[1692947111.510354][1770:1770] CHIP:CTL: Stopping commissioning discovery over DNS-SD
[1692947111.510399][1770:1770] CHIP:CTL: Setting attestation nonce to random value
[1692947111.510449][1770:1770] CHIP:CTL: Setting CSR nonce to random value
[1692947111.510523][1770:1770] CHIP:IN: UDP::Init bind&listen port=5550
[1692947111.510636][1770:1770] CHIP:IN: UDP::Init bound to port=5550
[1692947111.510659][1770:1770] CHIP:IN: UDP::Init bind&listen port=5550
[1692947111.510750][1770:1770] CHIP:IN: UDP::Init bound to port=5550
[1692947111.510794][1770:1770] CHIP:IN: TransportMgr initialized
[1692947111.511122][1770:1772] CHIP:DL: CHIP task running
[1692947111.511226][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 32786
[1692947111.511572][1770:1772] CHIP:CTL: Stopping commissioning discovery over DNS-SD
[1692947111.511624][1770:1772] CHIP:CTL: Setting attestation nonce to random value
[1692947111.511782][1770:1772] CHIP:CTL: Setting CSR nonce to random value
[1692947111.512660][1770:1772] CHIP:CTL: Generating NOC
[1692947111.513574][1770:1772] CHIP:FP: Validating NOC chain
[1692947111.515429][1770:1772] CHIP:FP: NOC chain validation successful
[1692947111.515593][1770:1772] CHIP:FP: Updated fabric at index: 0x1, Node ID: 0x000000000001B669
[1692947111.515624][1770:1772] CHIP:TS: Last Known Good Time: 2023-08-17T11:41:18
[1692947111.515644][1770:1772] CHIP:TS: New proposed Last Known Good Time: 2021-01-01T00:00:00
[1692947111.515664][1770:1772] CHIP:TS: Retaining current Last Known Good Time
[1692947111.517720][1770:1772] CHIP:FP: Metadata for Fabric 0x1 persisted to storage.
[1692947111.519526][1770:1772] CHIP:TS: Committing Last Known Good Time to storage: 2023-08-17T11:41:18
[1692947111.521164][1770:1772] CHIP:CTL: Joined the fabric at index 1. Fabric ID is 0x0000000000000001 (Compressed Fabric ID: 248092C7195391ED)
[1692947111.521216][1770:1772] CHIP:IN: UDP::Init bind&listen port=5550
[1692947111.521334][1770:1772] CHIP:IN: UDP::Init bound to port=5550
[1692947111.521358][1770:1772] CHIP:IN: UDP::Init bind&listen port=5550
[1692947111.521468][1770:1772] CHIP:IN: UDP::Init bound to port=5550
[1692947111.521492][1770:1772] CHIP:IN: TransportMgr initialized
[1692947111.535302][1770:1772] CHIP:CTL: Setting thread operational dataset from parameters
[1692947111.535343][1770:1772] CHIP:CTL: Setting attestation nonce to random value
[1692947111.535393][1770:1772] CHIP:CTL: Setting CSR nonce to random value
[1692947111.535427][1770:1772] CHIP:CTL: Commission called for node ID 0x0000000000000515
[1692947111.539903][1770:1771] CHIP:DL: TRACE: Bus acquired for name C-06ea
[1692947111.547043][1770:1772] CHIP:DL: PlatformBlueZInit init success
[1692947111.554586][1770:1771] CHIP:BLE: BLE removing known devices.
[1692947111.556372][1770:1771] CHIP:BLE: BLE initiating scan.
[1692947111.567152][1770:1771] CHIP:BLE: Device F4:93:1C:09:XX:XX does not look like a CHIP device.
[1692947111.571446][1770:1771] CHIP:BLE: Device 19:92:20:17:XX:03 does not look like a CHIP device.
[1692947111.578808][1770:1771] CHIP:BLE: Device 20:E7:B6:97:XX:EB does not look like a CHIP device.
[1692947111.586438][1770:1771] CHIP:BLE: Device 0B:20:85:16:XX:XX does not look like a CHIP device.
[1692947111.594071][1770:1771] CHIP:BLE: Device 20:E7:B6:96:XX:XX does not look like a CHIP device.
[1692947111.619388][1770:1771] CHIP:BLE: Device F4:93:1C:0A:XX:XX does not look like a CHIP device.
[1692947111.633802][1770:1771] CHIP:BLE: Device B8:F6:53:19:XX:XX does not look like a CHIP device.
[1692947111.643081][1770:1771] CHIP:BLE: Device 3B:66:66:BA:XX:XX does not look like a CHIP device.
[1692947111.651936][1770:1771] CHIP:BLE: Device F4:93:1C:0A:XX:XX does not look like a CHIP device.
[1692947111.653391][1770:1771] CHIP:BLE: Device 17:25:DC:6A:XX:XX does not look like a CHIP device.
[1692947111.656075][1770:1771] CHIP:BLE: New device scanned: 14:B4:57:5F:XX:XX
[1692947111.656132][1770:1771] CHIP:BLE: Device discriminator match. Attempting to connect.
[1692947112.174401][1770:1771] CHIP:DL: ConnectDevice complete
[1692947113.301181][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301232][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301260][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301277][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301300][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service000e
[1692947113.301317][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301336][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service000e
[1692947113.301353][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301371][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service000e
[1692947113.301388][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301411][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0001
[1692947113.301428][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301492][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0001
[1692947113.301510][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301528][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0001
[1692947113.301545][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301570][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service000e
[1692947113.301587][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301605][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service000e
[1692947113.301622][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301640][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service000e
[1692947113.301656][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301685][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301701][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301721][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301737][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301757][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301773][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301795][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301812][1770:1771] CHIP:DL: Char1 /org/bluez/hci0/dev_14_B4_57_5F_9E_56/service0015
[1692947113.301836][1770:1771] CHIP:DL: New BLE connection: conn 0xffff8805f060, device 14:B4:57:5F:9E:56, path /org/bluez/hci0/dev_14_B4_57_5F_9E_56
[1692947113.302060][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16385
[1692947113.302116][1770:1772] CHIP:DIS: Closing all BLE connections
[1692947113.302219][1770:1772] CHIP:IN: BleConnectionComplete: endPoint 0xaaaab9e18418
[1692947113.302681][1770:1772] CHIP:IN: SecureSession[0xffff80010b00]: Allocated Type:1 LSID:1347
[1692947113.302726][1770:1772] CHIP:SC: Assigned local session key ID 1347
[1692947113.302824][1770:1772] CHIP:EM: <<< [E:34373i S:0 M:218336557] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:20 (SecureChannel:PBKDFParamRequest)
[1692947113.302885][1770:1772] CHIP:IN: (U) Sending msg 218336557 to IP address 'BLE'
[1692947113.302911][1770:1772] CHIP:IN: Message appended to BLE send queue
[1692947113.302930][1770:1772] CHIP:SC: Sent PBKDF param request
[1692947113.732425][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16387
[1692947113.831944][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16388
[1692947113.832147][1770:1772] CHIP:BLE: subscribe complete, ep = 0xaaaab9e18418
[1692947113.833643][1770:1771] CHIP:DL: Indication received, conn = 0xffff8805f060
[1692947113.833942][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16389
[1692947113.834018][1770:1772] CHIP:BLE: peripheral chose BTP version 4; central expected between 4 and 4
[1692947113.834070][1770:1772] CHIP:BLE: using BTP fragment sizes rx 244 / tx 244.
[1692947113.834119][1770:1772] CHIP:BLE: local and remote recv window size = 5
[1692947113.835029][1770:1772] CHIP:IN: BLE EndPoint 0xaaaab9e18418 Connection Complete
[1692947113.927560][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16387
[1692947116.414819][1770:1771] CHIP:DL: Indication received, conn = 0xffff8805f060
[1692947116.415150][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16389
[1692947118.997398][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16387
[1692947121.484965][1770:1771] CHIP:DL: Indication received, conn = 0xffff8805f060
[1692947121.485604][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16389
[1692947121.559060][1770:1772] CHIP:DIS: Closing all BLE connections
[1692947121.559143][1770:1772] CHIP:IN: Clearing BLE pending packets.
[1692947121.559860][1770:1772] CHIP:BLE: Auto-closing end point's BLE connection.
[1692947121.559932][1770:1772] CHIP:DL: Closing BLE GATT connection (con 0xffff8805f060)
[1692947121.560163][1770:1771] CHIP:DL: BluezDisconnect peer=14:B4:57:5F:9E:56
[1692947124.167691][1770:1772] CHIP:IN: SecureSession[0xffff80010b00]: Released - Type:1 LSID:1347
[1692947124.167839][1770:1772] CHIP:TOO: Pairing Failure: src/platform/Linux/bluez/ChipDeviceScanner.cpp:173: CHIP Error 0x00000032: Timeout
[1692947124.167954][1770:1772] CHIP:BLE: BLE scan error: src/platform/Linux/bluez/ChipDeviceScanner.cpp:173: CHIP Error 0x00000032: Timeout
[1692947124.168037][1770:1772] CHIP:DL: HandlePlatformSpecificBLEEvent 16388
[1692947124.168098][1770:1772] CHIP:BLE: no endpoint for unsub complete
[1692947124.168546][1770:1770] CHIP:CTL: Shutting down the commissioner
[1692947124.168634][1770:1770] CHIP:CTL: Stopping commissioning discovery over DNS-SD
[1692947124.168889][1770:1770] CHIP:CTL: Shutting down the controller
[1692947124.169096][1770:1770] CHIP:IN: Expiring all sessions for fabric 0x1!!
[1692947124.169235][1770:1770] CHIP:FP: Forgetting fabric 0x1
[1692947124.169386][1770:1770] CHIP:TS: Pending Last Known Good Time: 2023-08-17T11:41:18
[1692947124.169846][1770:1770] CHIP:TS: Previous Last Known Good Time: 2023-08-17T11:41:18
[1692947124.170004][1770:1770] CHIP:TS: Reverted Last Known Good Time to previous value
[1692947124.170165][1770:1770] CHIP:CTL: Shutting down the commissioner
[1692947124.170224][1770:1770] CHIP:CTL: Stopping commissioning discovery over DNS-SD
[1692947124.170481][1770:1770] CHIP:CTL: Shutting down the controller
[1692947124.170640][1770:1770] CHIP:CTL: Shutting down the System State, this will teardown the CHIP Stack
[1692947124.170598][1770:1771] CHIP:DL: Bluez disconnected
[1692947124.170813][1770:1771] CHIP:DL: Bluez notify CHIPoBluez connection disconnected
[1692947124.171239][1770:1770] CHIP:DMG: All ReadHandler-s are clean, clear GlobalDirtySet
[1692947124.171522][1770:1770] CHIP:FP: Shutting down FabricTable
[1692947124.171594][1770:1770] CHIP:TS: Pending Last Known Good Time: 2023-08-17T11:41:18
[1692947124.171820][1770:1770] CHIP:TS: Previous Last Known Good Time: 2023-08-17T11:41:18
[1692947124.171878][1770:1770] CHIP:TS: Reverted Last Known Good Time to previous value
[1692947124.172370][1770:1770] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-0ibPGG)
[1692947124.173665][1770:1770] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)
[1692947124.173827][1770:1770] CHIP:DL: NVS set: chip-counters/total-operational-hours = 0 (0x0)
[1692947124.173888][1770:1770] CHIP:DL: Inet Layer shutdown
[1692947124.173937][1770:1770] CHIP:DL: BLE shutdown
[1692947124.177665][1770:1770] CHIP:DL: System Layer shutdown
[1692947124.178112][1770:1770] CHIP:TOO: Run command failure: src/platform/Linux/bluez/ChipDeviceScanner.cpp:173: CHIP Error 0x00000032: Timeout

Logs EFR32 (Matter device)

[00:28:41.292][info  ][DL] Connect Event for handle : 4
[00:28:41.440][info  ][DL] evt_UNKNOWN id = 090600a0
[00:28:42.902][info  ][DL] Char Write Req, char : 23
[00:28:42.902][detail][DL] Write request/command received for CHIPoBLE RX characteristic (con 4, len 9)
[00:28:42.903][info  ][DL] _OnPlatformEvent kCHIPoBLEWriteReceived
[00:28:42.903][info  ][BLE] local and remote recv window sizes = 5
[00:28:42.903][info  ][BLE] selected BTP version 4
[00:28:42.903][info  ][BLE] using BTP fragment sizes rx 244 / tx 244.
[00:28:43.000][info  ][DL] HandleTXcharCCCDWrite - Config Flags value : 2
[00:28:43.000][info  ][DL] CHIPoBLE subscribe received
[00:28:43.000][info  ][DL] _OnPlatformEvent kCHIPoBLESubscribe
[00:28:43.000][detail][IN] BLE EndPoint 0x200099e0 Connection Complete
[00:28:43.000][info  ][DL] _OnPlatformEvent default:  event->Type = 32775
[00:28:43.096][info  ][DL] Tx Confirmation received
[00:28:43.097][info  ][DL]  stop soft timer
[00:28:43.097][info  ][DL] _OnPlatformEvent kCHIPoBLEIndicateConfirm
[00:28:43.098][info  ][DL] Char Write Req, char : 23
[00:28:43.098][detail][DL] Write request/command received for CHIPoBLE RX characteristic (con 4, len 73)
[00:28:43.099][info  ][DL] _OnPlatformEvent kCHIPoBLEWriteReceived
[00:28:43.099][info  ][EM] >>> [E:46123r S:0 M:252510018] (U) Msg RX from 0:D43656EF5AC382A9 [0000] --- Type 0000:20 (SecureChannel:PBKDFParamRequest)
[00:28:45.680][info  ][DL] Tx Confirmation received
[00:28:45.681][info  ][DL]  stop soft timer
[00:28:45.681][info  ][DL] _OnPlatformEvent kCHIPoBLEIndicateConfirm
[00:28:48.166][info  ][DL] Char Write Req, char : 23
[00:28:48.166][detail][DL] Write request/command received for CHIPoBLE RX characteristic (con 4, len 3)
[00:28:48.167][info  ][DL] _OnPlatformEvent kCHIPoBLEWriteReceived
[00:28:50.750][info  ][DL] Tx Confirmation received
[00:28:50.750][info  ][DL]  stop soft timer
[00:28:50.751][info  ][DL] _OnPlatformEvent kCHIPoBLEIndicateConfirm
[00:28:50.995][info  ][DL] HandleTXcharCCCDWrite - Config Flags value : 0
[00:28:50.995][info  ][DL] CHIPoBLE unsubscribe received
[00:28:50.995][info  ][DL] _OnPlatformEvent kCHIPoBLEUnsubscribe
[00:28:50.995][detail][IN] Clearing BLE pending packets.
[00:28:50.996][info  ][BLE] Releasing end point's BLE connection back to application.
[00:28:53.530][info  ][DL] Disconnect Event for handle : 4
[00:28:53.530][info  ][DL] BLE GATT connection closed (con 4, reason 4115)
[00:28:53.530][info  ][DL] _OnPlatformEvent kCHIPoBLEConnectionError
[00:28:53.530][detail][DL] Start BLE advertissement
bzbarsky-apple commented 1 year ago

From the Matter device:

[00:28:43.099][info  ][EM] >>> [E:46123r S:0 M:252510018] (U) Msg RX from 0:D43656EF5AC382A9 [0000] --- Type 0000:20 (SecureChannel:PBKDFParamRequest)

but then it never sends a response. You should look into why it's not doing that.

Peugot206 commented 1 year ago

Thanks for your reply @bzbarsky-apple !

I have some interestings finds, but I am not sure whether this behaviour is expected. Since I was able to commission the device once, I was in the impression that I could either just re-commission it, or it would throw out an error that it was already commissioned. After a few days I was not able to send a cluster command nor unpair the Matter device, I think this is because I changed the Thread network credentials on both devices (even though the Thread credentials on both devices still match (ot-ctl dataset active -x and ot-ctl dataset set active and they are still connected ot-ctl state!).

Then I tried to factory reset (with the OT CLI) the Matter device and OTBR, still no luck. Then I completely erased the MCU memory flash of the Matter device, re-uploaded the Matter program and tried commissioning again, and it worked right away! I then changed the Thread credentials again on both devices, while they are still in the same Thread network(!), and I was unable to unpair the device nor send cluster commands, thus the device would get 'stuck' (since I can't unpair, nor commission).

Is this behaviour expected/known? Or is there going something wrong on my side?

ReneJosefsen commented 1 year ago

Do you reuse the same node id when doing the next commissioning attempts? Could you provide a full log from the EFR32?

bzbarsky-apple commented 1 year ago

I then changed the Thread credentials again on both devices, while they are still in the same Thread network(!), and I was unable to unpair the device nor send cluster commands

So it I understand correctly:

  1. You commission a Matter device onto a Thread network.
  2. You modify the Thread credentials on the Matter device so that it is now on a different Thread network?
  3. You change your controller to also be on the new Thread network.
  4. You try to send a command to the Matter device and it fails.

Is that a correct description of what you are observing? I am assuming that operational discovery fails when doing step 4? Does the new Thread network have an SRP server? Does the Matter device manage to register its operational advertisement with that SRP server?

Peugot206 commented 1 year ago

@bzbarsky-apple correct. thanks for your reply. This is why I assume it fails to send Matter commands after. I'm only changing the Thread credentials by creating a new dataset on both devices (although they match). So for the SRP server, I haven't heard of that before (but I just did some reading about it), and I leave that all to the default.

ReneJosefsen commented 1 year ago

If it is possible it would be great to get a full log from the EFR32.

After the credential changes I would assume it will need to reregister the SRP entry, so the log should state something like SRP update succeeded. In case this fails and the DUT is not able to register the SRP service the controller will not be able to resolve the node id and communication will fail.

My initial though was also related to SRP.

Peugot206 commented 1 year ago

@ReneJosefsen as I'm trying to gather logs from the EFR32, I suddenly run into this error on the RPI OTBR which runs CHIP-tool. I am not sure what is causing this.. I can remember that I did give it the command INFRA_IF_NAME=wlan0 WEB_GUI=1 ./script/setup in the hope this may solve things, maybe I broke something with this command...? I have never had this critical error before.

Error logs CHIP-tool:

(logs shortened for readability, see edit for full logs. Not relevant to the original issue).
(process:1606): GLib-GObject-CRITICAL **: 10:36:33.433: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
[1693557393.434109][1606:1606] CHIP:DL: System Layer shutdown
[1693557393.434531][1606:1606] CHIP:TOO: Run command failure: src/platform/Linux/BLEManagerImpl.cpp:688: CHIP Error 0x00000003: Incorrect state
ReneJosefsen commented 1 year ago

Hmm, not sure but it does look like bluez is complaining 🤔

Peugot206 commented 1 year ago

@ReneJosefsen my bad, I forgot to re-enable sudo systemctl enable hciuart, it works now :). Here are the full EFR32 while I try another commission. The onoff toggle commands are not received at all in the logs.

00> [00:00:00.000][silabs ]==================================================
00> [00:00:00.000][silabs ]SiLabs-Light starting
00> [00:00:00.000][silabs ]==================================================
00> [00:00:00.000][silabs ]Init CHIP Stack
00> [00:00:00.000][info  ][DL] Setting device name to : "SiLabs-Light"
00> [00:00:00.000][silabs ]Initializing OpenThread stack
00> [00:00:00.000][info  ][DL] OpenThread ifconfig up and thread start
00> [00:00:00.000][info  ][DL] OpenThread started: OK
00> [00:00:00.000][info  ][DL] Setting OpenThread device type to ROUTER
00> [00:00:00.000][silabs ]Starting OpenThread task
00> [00:00:00.000][info  ][SVR] Subscription persistence not supported
00> [00:00:00.000][info  ][SVR] Server initializing...
00> [00:00:00.000][detail][FP] Initializing FabricTable [00:00:25.512][info  ][DL] _OnPlatformEvent default:  event->Type = 32777
00> [00:00:25.512][detail][DL] OpenThread State Changed (Flags: 0x10000040)
00> [00:00:31.536][info  ][DL] _OnPlatformEvent default:  event->Type = 32777
00> [00:00:31.536][detail][DL] OpenThread State Changed (Flags: 0x200012a4)
00> [00:00:31.536][detail][DL]    Device Role: CHILD
00> [00:00:31.536][detail][DL]    Partition Id: 0x730C92D3
00> [00:00:31.537][info  ][DL] _OnPlatformEvent default:  event->Type = 32777
00> [00:00:31.537][detail][DL] Thread Attached updating Multicast address
00> [00:00:31.537][info  ][SVR] Joining Multicast groups
00> [00:00:31.538][detail][DL] OpenThread State Changed (Flags: 0x00000001)
00> [00:00:31.538][detail][DL]    Thread Unicast Addresses:
00> [00:00:31.538][detail][DL]         fda0:a80:e5a4:1:cd18:4537:cfb7:acde/64 valid preferred
00> [00:00:31.539][detail][DL]         fd30:2557:1980:b79f:0:ff:fe00:bc01/64 valid rloc
00> [00:00:31.539][detail][DL]         fd30:2557:1980:b79f:bd17:2eb9:bb46:a113/64 valid
00> [00:00:31.539][detail][DL]         fe80:0:0:0:90a4:b5cc:4f3:66fc/64 va[00:00:34.541][info  ][SWU] Stopping the watchdog timer
00> [00:00:34.541][info  ][SWU] Starting the periodic query timer, timeout: 86400 seconds
00> [00:00:34.541][detail][DMG] Endpoint 0, Cluster 0x0000_002A update version to df580c9e
00> [00:00:34.542][info  ][ZCL] Cluster callback: 0x0000_002A
00> [00:00:34.542][detail][DMG] Endpoint 0, Cluster 0x0000_002A update version to df580c9f
00> [00:00:34.542][info  ][ZCL] Cluster callback: 0x0000_002A
00> [00:00:42.227][info  ][DL] SRP Client was started, detected server: fd30:2557:1980:b79f:42aa:ebd7:b3d1:11de
00> [00:00:42.227][info  ][DL] _OnPlatformEvent default:  event->Type = 32777
00> [00:00:42.227][detail][DL] OpenThread State Changed (Flags: 0x00000200)
00> [00:00:42.742][detail][DL] SRP update succeeded
00> [00:00:42.742][info  ][DIS] Setting operational delegate post init
00> [00:00:42.742][info  ][DL] _OnPlatformEvent default:  event->Type = 32784
00> [00:00:42.742][error ][SVR] Server initialization complete
00> [00:00:42.742][info  ][DIS] Updating services using commissioning mode 0
00> [00:00:42.743][detail][DL] Using Thread extended MAC for hostname.
00> [00:00:42.743][info  ][DIS] Advertise operational node 248092C7195391ED-0000000000000002
00> [00:00:42.743][info  ][DL] advertising srp service: 248092C7195391ED-0000000000000002._matter._tcp
00> [00:00:42.743][detail][DL] Using Thread extended MAC for hostname.
00> [00:00:42.744][info  ][DIS] Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15 cm=[00:00:44.335][detail][DL] SRP update succeeded
00> [00:00:49.021][detail][DL] Start BLE advertissement
00> [00:00:55.453][info  ][DL] Connect Event for handle : 1
00> [00:00:55.568][info  ][DL] evt_UNKNOWN id = 090600a0
00> [00:00:57.030][info  ][DL] Char Write Req, char : 23
00> [00:00:57.030][detail][DL] Write request/command received for CHIPoBLE RX characteristic (con 1, len 9)
00> [00:00:57.031][info  ][DL] _OnPlatformEvent kCHIPoBLEWriteReceived
00> [00:00:57.031][info  ][BLE] local and remote recv window sizes = 5
00> [00:00:57.031][info  ][BLE] selected BTP version 4
00> [00:00:57.031][info  ][BLE] using BTP fragment sizes rx 244 / tx 244.
00> [00:00:57.127][info  ][DL] HandleTXcharCCCDWrite - Config Flags value : 2
00> [00:00:57.127][info  ][DL] CHIPoBLE subscribe received
00> [00:00:57.128][info  ][DL] _OnPlatformEvent kCHIPoBLESubscribe
00> [00:00:57.128][detail][IN] BLE EndPoint 0x200099e0 Connection Complete
00> [00:00:57.128][info  ][DL] _OnPlatformEvent default:  event->Type = 32775
00> [00:00:57.224][info  ][DL] Tx Confirmation received
00> [00:00:57.225][info  ][DL]  stop soft timer
00> [00:00:57.225][info  ][DL] _OnPlatformEvent kCHIPoBLEIndicateConfirm
00> [00:00:57.226][info  ][DL] Char Write Req, char : 23
00> [00:00:57.226][detail][DL] Write request/command received for CHIPoBLE RX characteristic (con 1, len 73)
00> [00:00:57.227][info  ][DL] _OnPlatformEvent kCHIPoBLEWriteReceived
00> [00:00:57.227][info  ][EM] >>> [E:26834r S:0 M:263958489] (U) Msg RX from 0:C8F66327FD0418E7 [0000] --- Type 0000:20 (SecureChannel:PBKDFParamRequest)
00> [00:00:59.809][info  ][DL] Tx Confirmation received
00> [00:00:59.809][info  ][DL]  stop soft timer
00> [00:00:59.809][info  ][DL] _OnPlatformEvent kCHIPoBLEIndicateConfirm
00> [00:01:02.295][info  ][DL] Char Write Req, char : 23
00> [00:01:02.295][detail][DL] Write request/command received for CHIPoBLE RX characteristic (con 1, len 3)
00> [00:01:02.296][info  ][DL] _OnPlatformEvent kCHIPoBLEWriteReceived
00> [00:01:04.879][info  ][DL] Tx Confirmation received
00> [00:01:04.879][info  ][DL]  stop soft timer
00> [00:01:04.879][info  ][DL] _OnPlatformEvent kCHIPoBLEIndicateConfirm
00> [00:01:05.416][info  ][DL] HandleTXcharCCCDWrite - Config Flags value : 0
00> [00:01:05.416][info  ][DL] CHIPoBLE unsubscribe received
00> [00:01:05.416][info  ][DL] _OnPlatformEvent kCHIPoBLEUnsubscribe
00> [00:01:05.416][detail][IN] Clearing BLE pending packets.
00> [00:01:05.416][info  ][BLE] Releasing end point's BLE connection back to application.
00> [00:01:07.512][info  ][DL] Disconnect Event for handle : 1
00> [00:01:07.512][info  ][DL] BLE GATT connection closed (con 1, reason 4115)
00> [00:01:07.512][info  ][DL] _OnPlatformEvent kCHIPoBLEConnectionError
00> [00:01:07.512][detail][DL] Start BLE advertissement
00> [00:01:35.126][info  ][DL] _OnPlatformEvent default:  event->Type = 32777
00> [00:01:35.126][detail][DL] OpenThread State Changed (Flags: 0x00000064)
00> [00:01:35.126][detail][DL]    Device Role: ROUTER
00> [00:01:37.515][detail][DL] bleAdv Timeout : Start slow advertissment
ReneJosefsen commented 1 year ago

The EFR32 does join the network and does report a successful SRP entry 🤔

00> [00:00:42.742][detail][DL] SRP update succeeded

Is the PI using a TH image ?

Peugot206 commented 1 year ago

Thanks for your help. I'm not sure what you mean with a TH image? 😅

The Raspberry Pi runs on Ubuntu 22.04, I built the Matter SDK on it from scratch and that seems to run fine (apart from sometimes having to restart some Bluetooth services). No pre-built images.

ReneJosefsen commented 1 year ago

Okay, I was wondering if it was based on the Test Harness (TH) images that are created for the specific matter releases.

I'm not super familiar with this setup, but it would be interesting to somehow verify if the routes in the network break when the thread dataset is changes.

Any chance you can generate a log of the whole scenario, so successful commissioning, toggle on/off, change dataset on EFR and OTBR and then the subsequent failure to communicate??