project-chip / rs-matter

Rust implementation of the Matter protocol. Status: Experimental
Apache License 2.0
310 stars 43 forks source link

Error when trying to add device to Google Home #102

Closed jasta closed 6 months ago

jasta commented 11 months ago

I got the onoff_light example working on my ESP32-C3-DevKitM-1 (after some tweaking to remove IPv6 support, see #100), but I'm having a couple of issues getting the device to appear and be controllable from Google Home.

First, trying to scan the QR code doesn't work and immediately produces an error as soon as I move my phone camera nearby to the QR code (see [1] below). If I switch to manually entering the pairing code things progress and I see a lot of log spamming from the device (which all looks successful generally and nothing appears to timeout).

Second, after seemingly trying to pair with the new device and bring it into Google Home I'm presented with [2] and I can't do anything to try again or proceed, only back out.

Full logs from boot-up attached here: https://gist.github.com/jasta/ed826fed780f9cc6109499b3fbe4f852

Is this expected? If no, anything I can try to do to debug and figure out what's wrong? I was able to successfully add a device using the main C++ matter repository and the all-clusters-app example and it seemingly worked well even to control the device. This was however a few months ago with a beta of Google Home.

[1] [2]

EDIT: Apologies I just realized the screenshots are in Spanish. [1] is saying that it's not a matter QR code, and [2] is saying that the device isn't certified for use with Matter.

jasta commented 11 months ago

Oops, looks like I need to do something in the Google Home developer console to make this work: https://developers.home.google.com/matter/integration/create

After doing this however, I got a new, more vague error message ("An error occurred", roughly translated to English) in Google Home, and an error logged on the esp32 running rs-matter as well: https://gist.github.com/jasta/7718387af24bde6462d7da54c10f7163

Looks like the relevant error is:

I (98667) rs_matter::data_model::sdm::general_commissioning: Handling command ARM Fail Safe
E (98677) rs_matter::data_model::sdm::failsafe: Received Fail-Safe Arm with different session modes; current Pase, incoming Case(CaseDetails { fab_idx: 1, cat_ids: [65537, 0, 0] })

Not sure what any of this really means though?

ssnover commented 11 months ago

It looks like Google Home controller might be arming a fail-safe expiration during the commissioning process (PASE) and then attempting to do so with CASE before completing commissioning? It's normal for a fail-safe to be armed during the commissioning process when the controller is talking to the device over a PASE session. However, setting it over a CASE session is usually done for NOC management.

Looking through your logs, it looks like Commissioning is never completed (I don't see a message saying "Handling command Commissioning Complete"). So it looks like the controller is emitting an Arm Fail Safe command when it shouldn't.

jasta commented 11 months ago

Thanks that's useful context but I'm not sure how we would go about fixing this. I'm interested in adding sufficient support to get a pet project off the ground but I'm not sure where to start fixing the commissioning flow.

Fwiw I'm also interested in adding BTP (for BLE commissioning) support, but I want to make sure the core flow works first.

ssnover commented 11 months ago

To be clear, I'm not a maintainer of this crate, just an interested party! I suspect that more information is needed in the logs in order to debug. I've primarily been testing with a SmartThings hub and filing issues where they come up, but I can also try out my Google Home with some additional logging.

ivmarkov commented 11 months ago

I got the onoff_light example working on my ESP32-C3-DevKitM-1 (after some tweaking to remove IPv6 support, see #100), but I'm having a couple of issues getting the device to appear and be controllable from Google Home.

You can't get the commissioning to complete - for Google Home specifically - without support for IPv6. You can do IPv4 only with chiptool though.

I'll comment more in #100.

jasta commented 11 months ago

Interesting, I dropped back to esp-idf 4.4.x and indeed I do not get the broadcast send error anymore (I haven't investigated why exactly but the error was when calling send_to here: https://github.com/project-chip/rs-matter/blob/main/rs-matter/src/transport/udp.rs#L146). I'll investigate why that would fail with ESP_IDF_VERSION="release/v5.0" soon.

But, more importantly, even with 4.4.x and IPv6 working I still can't connect to Google Home. First, the UX is different with IPv6 and the device is discovered automatically instead of making me force Google Home to try through Matter. But the error in the logs seems to be the same, but this time with some new errors spamming around Incorrect TLV list:

https://gist.github.com/jasta/a6644468ee77e0f5e10266576599064d

It looks like Matter itself is generating packets it's parser can't print? Note that this behaviour is observed before and after updating to the latest main branch (from maybe a week or two ago when I did the first test). The logs are from the latest main as of this writing.

jasta commented 11 months ago

Hmm, well this is odd. I just reran a test and while it still didn't work (i.e. adding the device failed with generic error screen from Google Home), this time I didn't receive any "Incorrect TLV list" errors. Something is starting to feel like a race condition either on the wire or in rs-matter code. @ivmarkov, do you have logs from a working session with Google Home that I can use for comparison?

I'm starting to think maybe #103 might be the answer as well. I noticed that we don't have structured logging of incoming or outgoing TLV structs so the logs are missing quite a bit of context if you suspect one particular message might not be getting handled (or sent) correctly.

jasta commented 11 months ago

Oh and also I tried disabling the error return in FailSafe::arm when the session modes differ but Google Home still presents an error immediately after issuing the second arm command:

E (51603) rs_matter::data_model::sdm::failsafe: IGNORING!!! Received Fail-Safe Arm with different session modes; current Pase, incoming Case(CaseDetails { fab_idx: 1, cat_ids: [65537, 0, 0] })
I (51623) rs_matter::transport::proto_hdr: [encode] ExId: 48006, Proto: 1, Opcode: 9, Flags: R|A|
I (51633) rs_matter::transport::packet: Sending packet IM:InvokeResponse: 
I (51643) rs_matter::tlv::parser: TLV list:
I (51643) rs_matter::tlv::parser: S8(64)
I (51653) rs_matter::tlv::parser: ---------
I (51733) rs_matter::transport::core: Transport: waiting for incoming packets
I (51733) rs_matter::transport::plain_hdr: [decode] flags: (empty), session type: Encrypted, sess_id: 2, ctr: 123089286
I (51743) rs_matter::transport::proto_hdr: [decode] ExId: 48006, Proto: 0, Opcode: 16, Flags: A|I| 
I (51753) rs_matter::transport::packet: Got packet SC:MRPStandAloneAck: 
I (51753) rs_matter::tlv::parser: TLV list:
I (51763) rs_matter::tlv::parser: ---------
I (51763) rs_matter::transport::core: Handler 0: Exchange completed

And then nothing else with the Google Home app in a "something went wrong" type of UI state.

ivmarkov commented 11 months ago

Some updates:

In any case, this needs debugging. I'll try to figure out what is going on in the next days.

UPDATE: I doubt there is any racing error though. It is all single threaded.

UPDATE 2: The "incorrect TLV list" might be a false alarm. I'll follow up on this as well.

ivmarkov commented 6 months ago

@kedars I don't know what exactly had changed in my Google Controller, but I'm no longer seeing this. Shall we close?

kedars commented 6 months ago

makes sense, can be reopened if we see it again. I suspect, maybe the unordered fix that Andrei did might have cleared this, since that's what changed in the last few weeks.