project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.45k stars 1.99k forks source link

[BUG] lighting-app commissioning fails with Home app on iOS 16.5 #27325

Open bndkk opened 1 year ago

bndkk commented 1 year ago

Reproduction steps

  1. Build lighting-app example
  2. Run it on a raspberry: sudo ./chip-lighting-app
  3. Scan QR Code using Home app on iOS 16.5 or higher and add to Home.
  4. Ignore "Uncertified Accessory" warning.
  5. Wait

The commissioning process never finishes and after a while the Home app shows a message that it had failed to commission. In the logs errors start to occur after the last successful command - CSRRequest. After that a Fail-safe timer expires and "Commissioning failed" is logged. Further down "Failed AddTrustedRootCertificate request" is also logged. See the attached lighting-app.log.

This bug is specific to iOS 16.5 and higher, because it just works if I test it with 16.4. All in all I tested with the following versions (also listing the version of the Apple TV I used as Home Hub):

The raspberry is running Raspberry PI OS 64-Bit.

Bug prevalence

Whenever I do this

GitHub hash of the SDK that was being used

4088a77f557e8571a39338fad51a1d8eb0131d79

Platform

darwin, raspi

Platform Version(s)

No response

Anything else?

lighting-app.log

bzbarsky-apple commented 1 year ago

From the log:

[1687182260.787122][1028:1028] CHIP:ZCL: OpCreds: Failed AddTrustedRootCertificate request with IM error 0xca (err = ../third_party/connectedhomeip/src/app/clusters/operational-credentials-server/operational-credentials-server.cpp:1143: CHIP Error 0x0000002F: Invalid argument)

0xCA is "FAILSAFE_REQUIRED".

@bndkk Are you able to file a Feedback Assistant ticket that would include client logs (from just the iPhone; the Apple TV logs should not be relevant for this failure), so we can investigate what's going on here? Please list the feedback assistant ticket id here if you do, or lest me know if you can't do that. Instructions for feedback assistant ticket:

bzbarsky-apple commented 1 year ago

Looking at the log some more, there is a CSRRequest command, then the client does not talk to the server for 30 seconds, so the fail-safe timer expires, and then right after that we get an AddTrustedRootCertificate command.

[1687182230.277012][1028:1028] CHIP:FS: GeneralCommissioning: Received ArmFailSafe (30s)
...
[1687182230.294231][1028:1028] CHIP:ZCL: OpCreds: Received a CSRRequest command
...
[1687182230.297395][1028:1028] CHIP:EM: <<< [E:63642r M:207330839 (Ack:77126170)] (S) Msg TX to 1:3BB143508D95B2FB [DE2D] --- Type 0001:09 (IM:InvokeCommandResponse)
...
[1687182230.305950][1028:1028] CHIP:EM: Rxd Ack; Removing MessageCounter:207330839 from Retrans Table on exchange 63642r
[1687182260.276991][1028:1028] CHIP:FS: Fail-safe timer expired
...
[1687182260.786570][1028:1028] CHIP:EM: >>> [E:63643r M:77126172] (S) Msg RX from 1:3BB143508D95B2FB [DE2D] --- Type 0001:08 (IM:InvokeCommandRequest)

It would be interesting to see what the server log from a success case looks like, by the way.

bndkk commented 1 year ago

@bzbarsky-apple I think I mixed up the version numbers. It works with 16.5 but not with 16.6 when testing both with Apple TV 16.6. I corrected that and created a feedback ticket also including a sysdiagnose from the failure and a log from a success case with iOS 16.5: FB12407416

bzbarsky-apple commented 1 year ago

@bndkk Thanks! So it looks like there are two things going on here:

  1. The device seems to set FailSafeExpiryLengthSeconds to 30, which means the fail-safe expires quite quickly.
  2. After the phone receives the CSR, it does some keychain operations each of which takes 4-5 seconds, so overall it takes 33 seconds to send the root cert to the device. By which point the fail-safe has expired.

I am looking into why the keychain bits are so slow, but I do wonder whether the difference is 16.5 vs 16.6 or whether it's different keychain contents on the two phones involved.

bndkk commented 1 year ago

I also tested commissioning with a phone running 17 beta and the same thing happens.

bzbarsky-apple commented 1 year ago

@bndkk Thank you, that is useful info....