project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.34k stars 1.97k forks source link

Generic Switch goes offline after sometime in Homekit (CASE Session establishment failed: 54) #25805

Closed absoluteabutaj closed 1 year ago

absoluteabutaj commented 1 year ago

Hello there,

I'm working Matter Generic Switch (0x000F) device type and I started off with Nordic semiconductor template application and able to commission the deivce to the Apple fabric successfully.

cswitch-to-apple-fabric

The device stays online for about 5-6 Hours and goes offline.

Rebooting the device makes it online and the cycle repeats.

When the device goes offline, I get this in my terminal.

00> I: 729849 [SC]Received Sigma1 msg
00> D: 729849 [SC]Found MRP parameters in the message
00> D: 729849 [SC]Peer assigned session key ID 58240
00> I: 729854 [SC]CASE matched destination ID: fabricIndex 1, NodeID 0x000000001EE63FF2
00> D: 730081 [SC]Including MRP parameters
00> I: 730082 [EM]<<< [E:43689r M:134778314 (Ack:146813535)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:31 (SecureChannel:CASE_Sigma2)
00> I: 730083 [IN](U) Sending msg 134778314 to IP address 'UDP:[fda9:49eb:c24a:1:4f8:92e2:c038:4f08]:54215'
00> I: 730084 [SC]Sent Sigma2 msg
00> I: 730085 [EM]>>> [E:4450i M:108836740 (Ack:55569058)] (S) Msg RX from 1:6726C27A4F6B7C34 [51F1] --- Type 0001:01 (IM:StatusResponse)
00> D: 730086 [EM]Found matching exchange: 4450i, Delegate: 0x2000f32c
00> D: 730086 [EM]Rxd Ack; Removing MessageCounter:55569058 from Retrans Table on exchange 4450i
00> I: 730087 [IM]Received status response, status is 0x00
00> D: 730087 [DMG]<RE> OnReportConfirm: NumReports = 0
00> D: 730087 [DMG]IM RH moving to [GeneratingReports]
00> I: 730088 [EM]<<< [E:4450i M:55569059 (Ack:108836740)] (S) Msg TX to 1:6726C27A4F6B7C34 [51F1] --- Type 0000:10 (SecureChannel:StandaloneAck)
00> I: 730089 [IN](S) Sending msg 55569059 on secure session with LSID: 46710
00> D: 730090 [EM]Flushed pending ack for MessageCounter:108836740 on exchange 4450i
00> D: 730090 [DMG]All ReadHandler-s are clean, clear GlobalDirtySet
00> D: 730301 [IN]Received a duplicate message with MessageCounter:108836740 on exchange 4450i
00> I: 730302 [EM]>>> [E:4450i M:108836740 (Ack:55569058)] (S) Msg RX from 1:6726C27A4F6B7C34 [51F1] --- Type 0001:01 (IM:StatusResponse)
00> D: 730302 [EM]Generating StandaloneAck via exchange: 4450i
00> D: 730302 [EM]Forcing tx of solitary ack for duplicate MessageCounter:108836740 on exchange 4450i
00> I: 730304 [EM]<<< [E:4450i M:55569060 (Ack:108836740)] (S) Msg TX to 1:6726C27A4F6B7C34 [51F1] --- Type 0000:10 (SecureChannel:StandaloneAck)
00> I: 730304 [IN](S) Sending msg 55569060 on secure session with LSID: 46710
00> D: 730326 [IN]Received a duplicate message with MessageCounter:108836740 on exchange 4450i
00> I: 730327 [EM]>>> [E:4450i M:108836740 (Ack:55569058)] (S) Msg RX from 1:6726C27A4F6B7C34 [51F1] --- Type 0001:01 (IM:StatusResponse)
00> D: 730327 [EM]Generating StandaloneAck via exchange: 4450i
00> D: 730328 [EM]Forcing tx of solitary ack for duplicate MessageCounter:108836740 on exchange 4450i
00> I: 730329 [EM]<<< [E:4450i M:55569061 (Ack:108836740)] (S) Msg TX to 1:6726C27A4F6B7C34 [51F1] --- Type 0000:10 (SecureChannel:StandaloneAck)
00> I: 730329 [IN](S) Sending msg 55569061 on secure session with LSID: 46710
00> D: 730563 [EM]Retransmitting MessageCounter:134778314 on exchange 43689r Send Cnt 1
00> I: 730563 [IN](U) Sending msg 134778314 to IP address 'UDP:[fda9:49eb:c24a:1:4f8:92e2:c038:4f08]:54215'
00> I: 730600 [EM]>>> [E:43689r M:146813536 (Ack:134778314)] (U) Msg RX from 0:23AD9FCEEF2D72BC [0000] --- Type 0000:40 (SecureChannel:StatusReport)
00> D: 730601 [EM]Found matching exchange: 43689r, Delegate: 0x20004454
00> D: 730601 [EM]Rxd Ack; Removing MessageCounter:134778314 from Retrans Table on exchange 43689r
00> E: 730602 [SC]Received error (protocol code 2) during pairing process: 54
00> E: 730603 [IN]CASE Session establishment failed: 54

Once the CASE Session establishment failed, no activity regarding the network in the terminal.

Setup: Hardware: nRF52840 SDK: nRF Connect SDK v2.2.0

Fabric Setup: BR: Homepod Mini -> tvOS 16.3.2 Controller: iPhone 13 -> iOS 16.3.1

bzbarsky-apple commented 1 year ago

@absoluteabutaj What version of the SDK (which exact git commit SHA) are you using for the device? What that looks like is that the Sigma2 the device sends is not correct and the other side ends up not establishing CASE. But presumably even before this something went wrong, because otherwise why is the other side trying to re-establish CASE at all?

So:

  1. Why is there a CASE re-establishment happening to start with?
  2. What is wrong with the Sigma2 the device is sending and why?
absoluteabutaj commented 1 year ago

@bzbarsky-apple - f4a75a5

Started with the template from here.

Further logs here: case-session-failing.txt

bzbarsky-apple commented 1 year ago

Further logs here:

That's a very different failure. That shows some sort of SRP timeout, followed by the server getting a Sigma1 and sending an ack, then sending a Sigma2, then getting a retransmitted Sigma1 (indicating the the ack and Sigma2 were both lost), sending an ack for the duplicate Sigma1, then not getting any more Sigma1 messages (because the ack got through, or because connectivity is lost completely?) but also not getting any acks for any of the Sigma2 retransmits it does, until it gives up.

absoluteabutaj commented 1 year ago

SRP timeout is the root for all of this? If yes, the attached log provides the reason for SRP timeout?

adminy commented 1 year ago

I'm getting the same thing: nordic sdk v2.3.0 nrf52840dongle_nrf52840

<dbg> chip: LogV: [DL]SRP update succeeded
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<inf> chip: [EM]>>> [E:56959r S:0 M:138206359] (U) Msg RX from 0:93427D31E0321BE5 [0000] --- Type 0000:30 (SecureChannel:CASE_Sigma1)
<inf> chip: [IN]CASE Server received Sigma1 message . Starting handshake. EC 0x20004878
<inf> chip: [IN]CASE Server disabling CASE session setups
<inf> chip: [EM]<<< [E:56959r S:0 M:134838261 (Ack:138206359)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:10 (SecureChannel:StandaloneAck)
<inf> chip: [IN](U) Sending msg 134838261 to IP address 'UDP:[2001:bb6:54cc:9100:3d95:8c2c:fc97:6495]:53945'
<inf> chip: [SC]Received Sigma1 msg
<inf> chip: [SC]CASE matched destination ID: fabricIndex 1, NodeID 0x00000000142A910F
<inf> chip: [EM]<<< [E:56959r S:0 M:134838262 (Ack:138206359)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:31 (SecureChannel:CASE_Sigma2)
<inf> chip: [IN](U) Sending msg 134838262 to IP address 'UDP:[2001:bb6:54cc:9100:3d95:8c2c:fc97:6495]:53945'
<inf> chip: [SC]Sent Sigma2 msg
<inf> chip: [EM]>>> [E:56959r S:0 M:138206360 (Ack:134838262)] (U) Msg RX from 0:93427D31E0321BE5 [0000] --- Type 0000:40 (SecureChannel:StatusReport)
<err> chip: [SC]Received error (protocol code 2) during pairing process: 54
<err> chip: [IN]CASE Session establishment failed: 54
<inf> chip: [IN]CASE Server enabling CASE session setups
<dbg> chip: LogV: [IN]SecureSession[0x20002f88]: Released - Type:2 LSID:49755
<dbg> chip: LogV: [IN]SecureSession[0x20002f88]: Allocated Type:2 LSID:49756
<inf> chip: [EM]<<< [E:56959r S:0 M:134838263 (Ack:138206360)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:10 (SecureChannel:StandaloneAck)
<inf> chip: [IN](U) Sending msg 134838263 to IP address 'UDP:[2001:bb6:54cc:9100:3d95:8c2c:fc97:6495]:53945'
<inf> chip: [EM]>>> [E:56960r S:0 M:138206361] (U) Msg RX from 0:2B79512F975BCE75 [0000] --- Type 0000:30 (SecureChannel:CASE_Sigma1)
<inf> chip: [IN]CASE Server received Sigma1 message . Starting handshake. EC 0x20004878
<inf> chip: [IN]CASE Server disabling CASE session setups
<inf> chip: [EM]<<< [E:56960r S:0 M:134838264 (Ack:138206361)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:10 (SecureChannel:StandaloneAck)
<inf> chip: [IN](U) Sending msg 134838264 to IP address 'UDP:[2001:bb6:54cc:9100:3d95:8c2c:fc97:6495]:53945'
<inf> chip: [SC]Received Sigma1 msg
<inf> chip: [SC]CASE matched destination ID: fabricIndex 1, NodeID 0x00000000142A910F
<inf> chip: [EM]<<< [E:56960r S:0 M:134838265 (Ack:138206361)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:31 (SecureChannel:CASE_Sigma2)
<inf> chip: [IN](U) Sending msg 134838265 to IP address 'UDP:[2001:bb6:54cc:9100:3d95:8c2c:fc97:6495]:53945'
<inf> chip: [SC]Sent Sigma2 msg
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<inf> chip: [EM]>>> [E:56960r S:0 M:138206362 (Ack:134838265)] (U) Msg RX from 0:2B79512F975BCE75 [0000] --- Type 0000:40 (SecureChannel:StatusReport)
<err> chip: [SC]Received error (protocol code 2) during pairing process: 54
<err> chip: [IN]CASE Session establishment failed: 54
<inf> chip: [IN]CASE Server enabling CASE session setups
<dbg> chip: LogV: [IN]SecureSession[0x20002f88]: Released - Type:2 LSID:49756
<dbg> chip: LogV: [IN]SecureSession[0x20002f88]: Allocated Type:2 LSID:49757
<inf> chip: [EM]<<< [E:56960r S:0 M:134838266 (Ack:138206362)] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:10 (SecureChannel:StandaloneAck)
<inf> chip: [IN](U) Sending msg 134838266 to IP address 'UDP:[2001:bb6:54cc:9100:3d95:8c2c:fc97:6495]:53945'
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<dbg> chip: LogV: [DL]Write request received for CHIPoBLE RX characteristic (ConnId 0x00)
<dbg> chip: LogV: [DL]Sending indication for CHIPoBLE TX characteristic (ConnId 00, len 3)
<dbg> chip: LogV: [DL]Indication for CHIPoBLE TX characteristic done (ConnId 0x00, result 0x00)
<inf> chip: [FS]Fail-safe timer expired
<err> chip: [SVR]Failsafe timer expired
<dbg> chip: LogV: [IN]SecureSession[0x20002ed0]: MarkForEviction Type:1 LSID:49754
<inf> chip: [SC]SecureSession[0x20002ed0]: Moving from state 'kActive' --> 'kPendingEviction'
<dbg> chip: LogV: [IN]SecureSession[0x20002ed0]: Released - Type:1 LSID:49754
<err> chip: [SVR]Commissioning failed (attempt 1): 32
<dbg> chip: LogV: [IN]Clearing BLE pending packets.
<inf> chip: [BLE]Releasing end point's BLE connection back to application.
<inf> chip: [DL]Closing BLE GATT connection (ConnId 00)
<dbg> chip: LogV: [IN]SecureSession[0x20002ed0]: Allocated Type:1 LSID:49758
<dbg> chip: LogV: [DL]CHIPoBLE advertising set to on
<inf> chip: [DIS]Updating services using commissioning mode 1
<dbg> chip: LogV: [DL]Using Thread extended MAC for hostname.
<inf> chip: [DIS]Advertise operational node 3DBC3C9EC277FD5C-00000000142A910F
<dbg> chip: LogV: [DL]Using Thread extended MAC for hostname.
<inf> chip: [DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15 cm=1
<inf> chip: [DL]advertising srp service: 8A24DE0E184A6D83._matterc._udp
<err> chip: [ZCL]OpCreds: Got FailSafeTimerExpired
<err> chip: [ZCL]OpCreds: Proceeding to FailSafeCleanup on fail-safe expiry!
<dbg> chip: LogV: [IN]Expiring all sessions for fabric 0x1!!
<err> chip: [FP]Reverting pending fabric data for fabric 0x1
<err> chip: [FP]Warning: metadata not found during delete of fabric 0x1
<inf> chip: [FP]Fabric (0x1) deleted.
<inf> chip: [ZCL]UserLabel: Last Fabric index 0x1 was removed
<inf> chip: [ZCL]OpCreds: Fabric index 0x1 was removed
<inf> chip: [DIS]Updating services using commissioning mode 1
<dbg> chip: LogV: [DL]Using Thread extended MAC for hostname.
<inf> chip: [DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15 cm=1
<inf> chip: [DL]advertising srp service: 8A24DE0E184A6D83._matterc._udp
<inf> chip: [DL]removing srp service: 3DBC3C9EC277FD5C-00000000142A910F._matter._tcp
<inf> chip: [TS]Pending Last Known Good Time: 2023-04-08T09:07:00
<inf> chip: [TS]Previous Last Known Good Time: 2023-04-08T09:07:00
<inf> chip: [TS]Reverted Last Known Good Time to previous value
<err> chip: [FP]Warning: metadata not found during delete of fabric 0x1
<err> chip: [ZCL]OpCreds: failed to delete fabric at index 1: d8
<inf> chip: [NP]Reverting Thread operational dataset
<err> chip: [DL]Long dispatch time: 181 ms, for event type 32782
<inf> chip: [FS]Fail-safe cleanly disarmed
bzbarsky-apple commented 1 year ago

@adminy Are you getting this randomly, or every time? Would you be able to log the payloads involved to see what's going on, exactly?

absoluteabutaj commented 1 year ago

This issue persists almost always.

Steps to reproduce:

  1. Build window covering example (I built for nRF)
  2. Commission to Homekit.

Sometimes it goes offline like in 15-20 minutes and the only way to recover is to factory reset it and re-commission and the cycle continues.

Just to understand, is that something to do with CSL (Coordinated Sample Listening)?

bzbarsky-apple commented 1 year ago

This issue persists almost always.

@absoluteabutaj OK, in that case can you do what I asked and log the payloads so we can tell what data is actually going across?

Just to understand, is that something to do with CSL (Coordinated Sample Listening)?

I really doubt it.

adminy commented 1 year ago

These are consistent errors

app: Init CHIP stack
[DL]BLE address: D9:FC:87:41:7F:5F
[DL]OpenThread started: OK
[DL]Setting OpenThread device type to ROUTER
[SVR]Subscription persistence not supported
[SVR]Server initializing...
[TS]Last Known Good Time: 2023-05-12T19:43:16
[DMG]AccessControl: initializing
[DMG]Examples::AccessControlDelegate::Init
[DMG]AccessControl: setting
[DMG]DefaultAclStorage: initializing
[DMG]DefaultAclStorage: 0 entries loaded
LogV: [IN]UDP::Init bind&listen port=5540
[IN]SO_REUSEPORT failed: 109
LogV: [IN]UDP::Init bound to port=5540
LogV: [IN]BLEBase::Init - setting/overriding transport
LogV: [IN]TransportMgr initialized
LogV: [DL]Using Thread extended MAC for hostname.
[ZCL]Using ZAP configuration...
[DMG]AccessControlCluster: initializing
[ZCL]Initiating Admin Commissioning cluster.
LogV: [DL]Boot reason: 6
[ZCL]DeviceInfoProvider is not registered
[ZCL]Trying to write invalid Calendar Type
[ZCL]Failed to write calendar type with error: 0x87
[ZCL]Endpoint 1 On/off already set to new value
[ZCL]Cluster LevelControl: attribute CurrentLevel set to 254
LogV: [IN]SecureSession[0x20002ed0]: Allocated Type:1 LSID:24923
LogV: [DL]CHIPoBLE advertising set to on
[DIS]Updating services using commissioning mode 1
[DIS]Failed to remove advertised services: 3
LogV: [DL]Using Thread extended MAC for hostname.
[DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15 cm=1
[DIS]Failed to advertise commissionable node: 3
[DIS]Failed to finalize service update: 3
[DIS]Updating services using commissioning mode 1
[DIS]Failed to remove advertised services: 3
LogV: [DL]Using Thread extended MAC for hostname.
[DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15 cm=1
[DIS]Failed to advertise commissionable node: 3
[DIS]Failed to finalize service update: 3
[DIS]Delaying proxy of operational discovery: missing delegate
[IN]CASE Server enabling CASE session setups
LogV: [IN]SecureSession[0x20002f88]: Allocated Type:2 LSID:24924
[SVR]Joining Multicast groups
[SVR]Server Listening...
[DL]Device Configuration:
[DL]  Serial Number: 11223344556677889900
[DL]  Vendor Id: 65521 (0xFFF1)
[DL]  Product Id: 32773 (0x8005)
[DL]  Product Name: not-specified
[DL]  Hardware Version: 0
[DL]  Setup Pin Code (0 for UNKNOWN/ERROR): 20202021
[DL]  Setup Discriminator (0xFFFF for UNKNOWN/ERROR): 3840 (0xF00)
[DL]  Manufacturing Date: (not set)
[DL]  Device Type: 65535 (0xFFFF)
[SVR]SetupQRCode: [MT:6FCJ142C00KA0648G00]
[SVR]Copy/paste the below URL in a browser to see the QR Code:
[SVR]https://project-chip.github.io/connectedhomeip/qrcode.html?data=MT%3A6FCJ142C00KA0648G00
[SVR]Manual pairing code: [34970112332]
LE advertising is already enabled
LogV: [IN]SecureSession[0x20002ed0]: Released - Type:1 LSID:24923
LogV: [IN]SecureSession[0x20002ed0]: Allocated Type:1 LSID:24925
[DIS]Updating services using commissioning mode 1
[DIS]Failed to remove advertised services: 3
LogV: [DL]Using Thread extended MAC for hostname.
[DIS]Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15 cm=1
[DIS]Failed to advertise commissionable node: 3
[DIS]Failed to finalize service update: 3
[DL]CHIP task running
[DL]CHIPoBLE advertising started
bzbarsky-apple commented 1 year ago

@adminy That log you pasted seems unrelated to this issue and is not showing any particularly unexpected errors: it just shows a BLE-enabled device that is not connected to any IP network coming up.

woody-apple commented 1 year ago

Hi there, are you still seeing this issue? If so please file a feedback ticket like so: