Closed cjandhyala closed 2 years ago
It works for me, however I used our own BLE/Thread device instead of Lighting app. It looks like an mdns/routing problem, can you try with both controllers on same raspi as a test?
It doesn't seem like an mdns problem to me - if you look at line 370/371 on the 2ndController.txt, you can see that avahi resolves the address, and I confirmed with Chaitanya that it's as expected. However, there is an error on line 386 that's concerning - have you seen OnMessageReceived failed, err = ../../src/messaging/ExchangeMgr.cpp:298: CHIP Error 0x0000002A: Invalid message type
happen before? Do you know what that might be about?
I see a lot of flakyness on our platform too with multi-admin (EFR32) on both TE6 branch and ToT. The commissionning flow fails for the 2nd admin
controller1 resolves ipv6 address fd7b:a81b:9770:0:dea6:32ff:fe8d:7072 but controller2 gets given ipv4 address 192.168.4.223. Then avahi resolves ipv4 address 192.168.6.222 which is again different. My setup uses ipv6 exclusively and it works.
So it looks like controller2 correctly resolves the device. It sends a Sigma1 (line 383 of 2ndController.txt). The device receives it (line 2358 of accessory_log.txt) and sends a Sigma2 (line 2394). The commissioner receives the Sigma2 but (starting line 384 in 2ndController.txt):
[1633646951.870298][3191:3199] CHIP:EM: Received message of type 0x31 with vendorId 0x0000 and protocolId 0x0000 on exchange 13378i
[1633646951.870406][3191:3199] CHIP:EM: Handling via exchange: 13378i, Delegate: 0x(nil)
[1633646951.870469][3191:3199] CHIP:EM: OnMessageReceived failed, err = ../../src/messaging/ExchangeMgr.cpp:298: CHIP Error 0x0000002A: Invalid message type
So most pressingly, the exchange delegate is null, so the Sigma2 message is not delivered to the CASE state machine and then the handshake fails.
I am also not sure what the deal is with the "invalid message type" message there. That means this check tested true:
if (ec->IsEncryptionRequired() != packetHeader.GetFlags().Has(Header::FlagValues::kEncryptedMessage))
which is pretty odd in this case.
Anyway, after this the commissioner retransmits Sigma1 a few times, and the other side retransmits Sigma2 a few times, but they all get dropped as duplicates. And in any case, on the server side we had, 5 seconds after sending Sigma2 (line 2396 in the accessory log):
[1633646956.576393][3488:3488] CHIP:SVR: Commissioning failed with error ../../examples/lighting-app/linux/third_party/connectedhomeip/src/platform/DeviceControlServer.cpp:48: CHIP Error 0x00000032: Timeout
[1633646956.576453][3488:3488] CHIP:SVR: Commissioning failed (attempt 1): ../../examples/lighting-app/linux/third_party/connectedhomeip/src/platform/DeviceControlServer.cpp:48: CHIP Error 0x00000032: Timeout
which happens before the retransmits.
But the big question is what closed the CASE establishment exchange on 2ndController after sending Sigma1, so that the delegate ended up null. @pan-apple any ideas?
Here is what closed the CASE exchange: In accessory_log.txt: At 1633646951: Arm Fail-safe command with ExpiryLengthSeconds 5, meaning 5 seconds max till receiving CommissioningComplete command, At 1633646956 i.e. 5 seconds later: CHIP Error 0x00000032: Timeout Looks like 5 seconds is not enough for commissioning.
5 seconds is definitely no enough! CASE can take longer than that on very constrained CPUs
K, so let's bump that up and see if it resolves things - that was set pre-case.
Here is what closed the CASE exchange: In accessory_log.txt:
That's on the server, though. But the exchange is already closed on the client (commissioner) when we get the Sigma2....
That said, @cjandhyala could you try the change in #10357 and see if that helps?
Hi, I cherry-picked #10357 on the test_event_6 branch (82bcd4e) for the two controllers.
Multi-admin still often fails on my end for the second admin. I use the same setup as describe in this ticket except that my end device node is a EFR32 lighting app
I mostly see two different errors when it fails
1) controller # 2 give me the following error at the OpCSRResponse
Sigma3_TO_Controller#1.txt Sigma3_TO_EFR32.txt Sigma3_TO_Controller#2.txt
Received certificate signing request from the device
[1634068509.841325][3044:3052] CHIP:CTL: Getting certificate chain for the device from the issuer
[1634068509.841372][3044:3052] CHIP:CTL: Verifying Certificate Signing Request
[1634068509.841956][3044:3052] CHIP:CR: ssl err elliptic curve routines EC_POINT_set_affine_coordinates point is not on curve
[1634068509.842054][3044:3052] CHIP:CR: ssl err elliptic curve routines o2i_ECPublicKey EC lib
[1634068509.842138][3044:3052] CHIP:CR: ssl err elliptic curve routines eckey_pub_decode decode error
[1634068509.842217][3044:3052] CHIP:CR: ssl err x509 certificate routines x509_pubkey_decode public key decode error
[1634068509.842292][3044:3052] CHIP:CTL: Failed to process the certificate signing request
[1634068509.842371][3044:3052] CHIP:CTL: SyncSetKeyValue on StartKeyID
[1634068509.842531][3044:3052] CHIP:CTL: SyncSetKeyValue on PairedDevicec9
Failed to establish secure session to device: 172
2) Controller # 2 timeout in on packet after sending SigmaR3 msg
Sent SigmaR3 msg
[1634070546.199595][3124:3132] CHIP:IN: New secure session created for device 0x00000000000000CA, key 3!!
[1634070546.199758][3124:3132] CHIP:CTL: Calling commissioning complete
[1634070546.199841][3124:3132] CHIP:DMG: ICR moving to [Initialize]
[1634070546.199961][3124:3132] CHIP:DMG: ICR moving to [AddCommand]
[1634070546.200395][3124:3132] CHIP:IN: Secure message was encrypted: Msg ID 0
[1634070546.200421][3124:3132] CHIP:IN: Build encrypted message 0x36f60d90 to 0x00000000000000CA of type 8 and protocolId 1 on exchange 6460r with MessageCounter 0.
[1634070546.200555][3124:3132] CHIP:IN: Sending encrypted msg 0x36f60d90 to 0x00000000000000CA at utc time: 12316362 msec
[1634070546.200580][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070546.200705][3124:3132] CHIP:DMG: ICR moving to [ Sending]
[1634070551.187118][3124:3132] CHIP:IN: Sending plaintext msg 0x36f60d78 to 0x0000000000000000 at utc time: 12321348 msec
[1634070551.187218][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070551.187434][3124:3132] CHIP:EM: Retransmit MessageCounter:A5298C7A Send Cnt 1
[1634070551.187521][3124:3132] CHIP:IN: Sending encrypted msg 0x36f60d90 to 0x00000000000000CA at utc time: 12321349 msec
[1634070551.187572][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070551.187852][3124:3132] CHIP:EM: Retransmit MessageCounter:00000000 Send Cnt 1
[1634070551.440478][3124:3132] CHIP:IN: Sending plaintext msg 0x36f60d78 to 0x0000000000000000 at utc time: 12321602 msec
[1634070551.440554][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070551.440724][3124:3132] CHIP:EM: Retransmit MessageCounter:A5298C7A Send Cnt 2
[1634070551.440808][3124:3132] CHIP:IN: Sending encrypted msg 0x36f60d90 to 0x00000000000000CA at utc time: 12321602 msec
[1634070551.440896][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070551.441142][3124:3132] CHIP:EM: Retransmit MessageCounter:00000000 Send Cnt 2
[1634070551.696626][3124:3132] CHIP:IN: Sending plaintext msg 0x36f60d78 to 0x0000000000000000 at utc time: 12321858 msec
[1634070551.696701][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070551.696857][3124:3132] CHIP:EM: Retransmit MessageCounter:A5298C7A Send Cnt 3
[1634070551.696937][3124:3132] CHIP:IN: Sending encrypted msg 0x36f60d90 to 0x00000000000000CA at utc time: 12321858 msec
[1634070551.697051][3124:3132] CHIP:IN: Sending msg on generic transport
[1634070551.697226][3124:3132] CHIP:EM: Retransmit MessageCounter:00000000 Send Cnt 3
[1634070551.952746][3124:3132] CHIP:EM: Failed to Send CHIP MessageCounter:A5298C7A sendCount: 3 max retries: 3
[1634070551.952855][3124:3132] CHIP:EM: Failed to Send CHIP MessageCounter:00000000 sendCount: 3 max retries: 3
[1634070558.203486][3124:3132] CHIP:DMG: Time out! failed to receive invoke command response from Exchange: 6460i
[1634070558.203663][3124:3132] CHIP:ZCL: DefaultResponse:
[1634070558.203723][3124:3132] CHIP:ZCL: Transaction: 0xffffabaf7390
[1634070558.203771][3124:3132] CHIP:ZCL: status: EMBER_ZCL_STATUS_FAILURE (0x01)
[1634070558.203832][3124:3132] CHIP:CTL: Received failure response 1
[1634070558.203943][3124:3132] CHIP:CTL: SyncSetKeyValue on StartKeyID
[1634070558.204161][3124:3132] CHIP:CTL: SyncSetKeyValue on PairedDeviceca
Failed to establish secure session to device: 1
OTBR fails to update the last SRP service after receiving SigmaR3. I am not sure if it is related. The following log snipped is from the EFR32 lighting app.
00> <detail> [SC] Received SigmaR3 msg
00> <detail> [DL] Thread packet RCVD: UDP, len 100
00> <detail> [DL] src FD11:33::DEA6:32FF:FE72:7537, port 44111
00> <detail> [DL] dest FD11:22::F387:97AD:2458:B238, port 5540
00> <error > [DL] OnSrpClientNotification: Operation refused for security reasons
00> <error > [DL] OnSrpClientNotification: Operation refused for security reasons
00> <detail> [DL] Thread packet RCVD: UDP, len 645
00> <detail> [DL] src FD11:33::DEA6:32FF:FE72:7537, port 44111
00> <detail> [DL] dest FD11:22::F387:97AD:2458:B238, port 5540
SSL_ERR_Controller#1.txt SSL_ERR_EFR32.txt SSL_ERR_Controller#2.txt
I am attaching all 3 logs for both scenario. (Ctrl 1, Ctrl 2, EFR32_Light_app)
Can you make any sense of either of those failures?
I think I'm running into the same issue now, and I'm noticing some logs that suggest we may be adding opcerts twice. It's hard to tell because the device attestation code re-uses some of the same stuff, but I'm seeing two entries in the fabric table for one commissioning.
https://github.com/project-chip/connectedhomeip/issues/12821 will contribute to this.
@cjandhyala Is this still an issue?
Issue Review: Closing issues prior to TE8. Please re-file if this is still an issue on newer SDKs
Adding 2nd admin(2nd python control) fails On TE6 branch (82bcd4e36f4856a4216cb76a16116faeec40e223).
Steps to reproduce :
Attached logs, 1st_controller.txt 2ndController.txt accessory_log.txt