home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.34k stars 29.88k forks source link

Matter integration - can't connect devices to HA #99857

Closed rohitbhatia19 closed 10 months ago

rohitbhatia19 commented 1 year ago

The problem

Hi HA community,

I am trying to integrate a KASA/TAPO matter-based smart switch KP125M to the HA Matter (Beta) integration. I have a flat network behind the router. Yet, adding a device fails. I am able to add this same switch (and another matter dimmer) to the Samsung SmartThings station without any issues - so it is unlikely that my network is the issue. Can someone help?

HA info: Home Assistant 2023.8.4 Supervisor 2023.08.1 Operating System 10.5 Frontend 20230802.1 - latest

Matter server version: 4.9.0

Here’s the Matter Server log:

2023-09-01 13:25:05 core-matter-server chip.DMG[126] DEBUG ICR moving to [AddedComma]
2023-09-01 13:25:05 core-matter-server chip.EM[126] INFO <<< [E:39086i S:17259 M:15537546] (S) Msg TX to 0:FFFFFFFB00000000 [0000] — Type 0001:08 (IM:InvokeCommandRequest)
2023-09-01 13:25:05 core-matter-server chip.IN[126] INFO (S) Sending msg 15537546 on secure session with LSID: 17259
2023-09-01 13:25:05 core-matter-server chip.DMG[126] DEBUG ICR moving to [CommandSen]
2023-09-01 13:25:05 core-matter-server chip.DMG[126] DEBUG ICR moving to [AwaitingDe]
2023-09-01 13:25:11 core-matter-server chip.EM[126] DEBUG Retransmitting MessageCounter:15537546 on exchange 39086i Send Cnt 1
2023-09-01 13:25:11 core-matter-server chip.IN[126] INFO (S) Sending msg 15537546 on secure session with LSID: 17259
2023-09-01 13:25:16 core-matter-server chip.EM[126] DEBUG Retransmitting MessageCounter:15537546 on exchange 39086i Send Cnt 2
2023-09-01 13:25:16 core-matter-server chip.IN[126] INFO (S) Sending msg 15537546 on secure session with LSID: 17259
2023-09-01 13:25:27 core-matter-server chip.EM[126] DEBUG Retransmitting MessageCounter:15537546 on exchange 39086i Send Cnt 3
2023-09-01 13:25:27 core-matter-server chip.IN[126] INFO (S) Sending msg 15537546 on secure session with LSID: 17259
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for SRV records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for non-srv records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for SRV records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for non-srv records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for SRV records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for non-srv records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for SRV records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for non-srv records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for SRV records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for non-srv records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for SRV records)
2023-09-01 13:25:35 core-matter-server chip.DIS[126] ERROR DNSSD packet parsing failed (for non-srv records)
2023-09-01 13:25:41 core-matter-server chip.EM[126] DEBUG Retransmitting MessageCounter:15537546 on exchange 39086i Send Cnt 4
2023-09-01 13:25:41 core-matter-server chip.IN[126] INFO (S) Sending msg 15537546 on secure session with LSID: 17259
2023-09-01 13:26:06 core-matter-server chip.EM[126] ERROR Failed to Send CHIP MessageCounter:15537546 on exchange 39086i sendCount: 4 max retries: 4
2023-09-01 13:26:18 core-matter-server chip.DMG[126] INFO Time out! failed to receive invoke command response from Exchange: 39086i
2023-09-01 13:26:18 core-matter-server chip.CTL[126] INFO Received failure response when disarming failsafesrc/app/CommandSender.cpp:248: CHIP Error 0x00000032: Timeout
2023-09-01 13:26:18 core-matter-server chip.CTL[126] INFO Successfully finished commissioning step ‘Cleanup’
2023-09-01 13:26:18 core-matter-server chip.IN[126] DEBUG SecureSession[0xffff6c01aba0]: MarkForEviction Type:1 LSID:17259
2023-09-01 13:26:18 core-matter-server matter_server.server.device_controller[126] DEBUG Interviewing node: 7
2023-09-01 13:26:18 core-matter-server chip.SC[126] INFO SecureSession[0xffff6c01aba0]: Moving from state ‘kActive’ → ‘kPendingEviction’
2023-09-01 13:26:18 core-matter-server matter_server.server.device_controller[126] DEBUG Attempting to resolve node 7…
2023-09-01 13:26:18 core-matter-server chip.IN[126] DEBUG SecureSession[0xffff6c01aba0]: Released - Type:1 LSID:17259
2023-09-01 13:26:18 core-matter-server chip.DMG[126] DEBUG ICR moving to [AwaitingDe]
2023-09-01 13:26:18 core-matter-server chip.CSM[126] DEBUG FindOrEstablishSession: PeerId = [1:0000000000000007]
2023-09-01 13:26:18 core-matter-server chip.CSM[126] DEBUG FindOrEstablishSession: No existing OperationalSessionSetup instance found
2023-09-01 13:26:18 core-matter-server chip.DIS[126] DEBUG OperationalSessionSetup[1:0000000000000007]: State change 1 → 2
2023-09-01 13:26:18 core-matter-server chip.DIS[126] INFO Checking node lookup status after 200 ms
2023-09-01 13:26:49 core-matter-server chip.DIS[126] ERROR Timeout waiting for mDNS resolution.
2023-09-01 13:27:03 core-matter-server chip.DIS[126] INFO Checking node lookup status after 45010 ms
2023-09-01 13:27:03 core-matter-server chip.DIS[126] ERROR OperationalSessionSetup[1:0000000000000007]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:114: CHIP Error 0x00000032: Timeout
2023-09-01 13:27:03 core-matter-server matter_server.server.device_controller[126] DEBUG Attempting to resolve node 7…
2023-09-01 13:27:03 core-matter-server chip.CSM[126] DEBUG FindOrEstablishSession: PeerId = [1:0000000000000007]
2023-09-01 13:27:03 core-matter-server chip.CSM[126] DEBUG FindOrEstablishSession: No existing OperationalSessionSetup instance found
2023-09-01 13:27:03 core-matter-server chip.DIS[126] DEBUG OperationalSessionSetup[1:0000000000000007]: State change 1 → 2
2023-09-01 13:27:03 core-matter-server chip.DIS[126] INFO Checking node lookup status after 201 ms
2023-09-01 13:27:34 core-matter-server chip.DIS[126] ERROR Timeout waiting for mDNS resolution.
2023-09-01 13:27:48 core-matter-server chip.DIS[126] INFO Checking node lookup status after 45003 ms
2023-09-01 13:27:48 core-matter-server chip.DIS[126] ERROR OperationalSessionSetup[1:0000000000000007]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:114: CHIP Error 0x00000032: Timeout
2023-09-01 13:27:48 core-matter-server matter_server.server.device_controller[126] DEBUG Attempting to resolve node 7…
2023-09-01 13:27:48 core-matter-server chip.CSM[126] DEBUG FindOrEstablishSession: PeerId = [1:0000000000000007]
2023-09-01 13:27:48 core-matter-server chip.CSM[126] DEBUG FindOrEstablishSession: No existing OperationalSessionSetup instance found
2023-09-01 13:27:48 core-matter-server chip.DIS[126] DEBUG OperationalSessionSetup[1:0000000000000007]: State change 1 → 2
2023-09-01 13:27:48 core-matter-server chip.DIS[126] INFO Checking node lookup status after 201 ms
2023-09-01 13:28:19 core-matter-server chip.DIS[126] ERROR Timeout waiting for mDNS resolution.
2023-09-01 13:28:33 core-matter-server chip.DIS[126] INFO Checking node lookup status after 45002 ms
2023-09-01 13:28:33 core-matter-server chip.DIS[126] ERROR OperationalSessionSetup[1:0000000000000007]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:114: CHIP Error 0x00000032: Timeout
2023-09-01 13:28:33 core-matter-server matter_server.server.client_handler[126] ERROR [281472721071888] Error handling message: CommandMessage(message_id=‘31085f1fcc824888a457c9122878620c’, command=‘commission_on_network’, args={‘setup_pin_code’: 99225754})
Traceback (most recent call last):
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 889, in _resolve_node
    await self._call_sdk(
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 766, in _call_sdk
    await self.server.loop.run_in_executor(
  File “/usr/local/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
    result = self.fn(*self.args, **self.kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File “/usr/local/lib/python3.11/site-packages/chip/ChipDeviceCtrl.py”, line 491, in ResolveNode
    self.GetConnectedDeviceSync(nodeid, allowPASE=False)
  File “/usr/local/lib/python3.11/site-packages/chip/ChipDeviceCtrl.py”, line 731, in GetConnectedDeviceSync
    returnErr.raise_on_error()
  File “/usr/local/lib/python3.11/site-packages/chip/native/init.py”, line 67, in raise_on_error
    raise self.to_exception()
chip.exceptions.ChipStackError: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:114: CHIP Error 0x00000032: Timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 304, in interview_node
    await self._resolve_node(node_id=node_id)
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 897, in _resolve_node
    await self._resolve_node(node_id=node_id, retries=retries - 1)
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 897, in _resolve_node
    await self._resolve_node(node_id=node_id, retries=retries - 1)
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 896, in _resolve_node
    raise NodeNotResolving(f"Unable to resolve Node {node_id}") from err
matter_server.common.errors.NodeNotResolving: Unable to resolve Node 7

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/client_handler.py”, line 188, in _run_handler
    result = await result
^^^^^^^^^^^^
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 221, in commission_on_network
    await self.interview_node(node_id)
  File “/usr/local/lib/python3.11/site-packages/matter_server/server/device_controller.py”, line 314, in interview_node
    raise NodeInterviewFailed(f"Failed to interview node {node_id}") from err
matter_server.common.errors.NodeInterviewFailed: Failed to interview node 7

Thanks in advance

What version of Home Assistant Core has the issue?

2023.9.0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Matter

Link to integration documentation on our website

https://www.home-assistant.io/integrations/matter/

Diagnostics information

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 1 year ago

Hey there @home-assistant/matter, mind taking a look at this issue as it has been labeled with an integration (matter) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `matter` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign matter` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


matter documentation matter source (message by IssueLinks)

marcelveldt commented 1 year ago

Well, that your Samsung SmartThings hub works does not have to say your network is not the issue. The Samsung hub is probably connected over WiFi instead of ethernet on the same Access point as the Tapo plug ?

For matter to work reliably, any (mdns) multicast Ipv6 traffic has to travel freely as that is how devices are discovered and communicating. Some network setups are blocking this traffic by default, such as Ubiquity Unifi and you have to disable that features. Look in your router for options like "Multicast enhancements" and disable it. Also some IGMP solutions are known to be broken so it helps to toggle it on or off as test.

In your log there is a really clear indication of mDNS lookup issues so your network is indeed the issue, sorry.

marcelveldt commented 12 months ago

@rohitbhatia19 did you ever find out the culprit of the issue ?

rohitbhatia19 commented 11 months ago

Hi. Unfortunately, I am traveling for work and in another country, so I can't do thorough tests. However, some notes:

  1. I have an Orbi mesh router and 2 satellites - the HA is connected to the router over Ethernet and the smart plug I was testing was connected to a satellite
  2. I ensured, after your feedback, that mdns isn't blocked
  3. When attempting to connect the plug to HA, I "wiresharked" the traffic coming from HA instance and could see some MDNS traffic - I am not sure if it was making it to the plug though

I plan to work on this after I return home toward the end of October. I'll keep you updated. Thanks.

marcelveldt commented 10 months ago

@rohitbhatia19 can we close here ?

This is strictly taken not an issue but more a help request for your specific situation. Our discord server is a very good place to get (community) support with these kind of questions so we can use the issue tracker here for bug reports of the code.

A few tips/hint:

See our documentation about network topologies. Matter is based around IPv6 multicast (mDNS) which is not always properly handled by network gear, especially not the semi professional stuff like Unifi which tends to block as much multicast as possible (which makes sense in a way for business environments).

So as soon as the IPv6 multicast traffic can travel freely in your network, issues like these will resolve itself.

rohitbhatia19 commented 10 months ago

I'll seek help from the community, that's fine. However, I thought I'd let you know 2 things:

  1. Using Wireshark, I am able to see MDNS traffic from and to the Home Assistant (HA Blue hardware)
  2. I added a matter device to the Samsung SmartthingsStation, then, I chose the option to share that device with HA. I was able to add it to HA as a result. Strange. Thanks anyway!
marcelveldt commented 10 months ago

1) Be aware that Matter is IPv6 and some switches/routers do not have a solid MLD snooping implementation (IGMP snooping for ipv6) and some filter out parts of the IPv6 multicast traffic. We've seen weird cases. Even examples where HA Matter on the WiFi would work perfectly fine but connecting it over ethernet doesn't.

2) Ah good that sharing it from Samsung to HA worked! At least that means that traffic itself is working. Maybe it was just a random fluke then. Was your phone on the same 2,4 Ghz as the device when you tried to commission using the HA companion app ?

If you need more support, check out the HA discord

rohitbhatia19 commented 10 months ago

Hi there.

Did some more tests. I'm not looking for help, but only letting you know with the intent of improving the matter integration. After trying several things, nothing worked. So, I reset the smart plug, removed the Matter integration and Matter Server addon, reinstalled the integration, which reinstalls the addon, and tried to add the smart plug. It connected the first time. All this was done without any additional changes to my network gear. That makes me question if it actually is a network issue. When trying to add the second device of the same type (also factory reset), I get an error as in the screenshot.

HA matter

My last message here. I'll check HA discord if I need help. But I thought it's important that I give this feedback.

Thanks.

TRusselo commented 10 months ago

Hi there.

Did some more tests. I'm not looking for help, but only letting you know with the intent of improving the matter integration. After trying several things, nothing worked. So, I reset the smart plug, removed the Matter integration and Matter Server addon, reinstalled the integration, which reinstalls the addon, and tried to add the smart plug. It connected the first time. All this was done without any additional changes to my network gear. That makes me question if it actually is a network issue. When trying to add the second device of the same type (also factory reset), I get an error as in the screenshot.

My last message here. I'll check HA discord if I need help. But I thought it's important that I give this feedback.

Thanks.

that matches my experience, fist plug worked. 2nd 3rd ect would not connect. deleted 1st, and... wont connect again.

pseud0sphere commented 9 months ago

I was having this same issue of the first device pairing correctly, but subsequent devices would get to the final "Connecting device to Home Assistant" prompt before erroring out with "Something went wrong".

Some random steps (couldn't reproduce) of changing configs and repeatedly restarting both HA and the Python Matter server (both in docker containers) eventually got me to a second, but I could never get a third device, and after unpairing the second, I got into the same trouble again. After some random cycle of restarting and changing configs, I eventually got a second, but never past that.

After downgrading to HA 2023.8.2, I was able to pair everything in one shot without any errors. More recent images that I had, starting with 2023.9.3 and later, all had the same errors as the most recent release (2023.12.1), so something seems to have broken in an update between 2023.8.2 and 2023.9.3.