home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.21k stars 30.57k forks source link

TP-Link Kasa Integration not working. Device status shows "Failed setup, will retry". #103977

Closed JackBeQuick87 closed 8 months ago

JackBeQuick87 commented 11 months ago

The problem

All devices using the TP-Link Kasa integration show "Failed setup, will retry". Hovering over the status shows "Unable to get discovery response for [device's hostname]."

Issue appeared immediately after updating to Home-Assistant v2023.11. The integration was known to work before upgrading (I upgraded from v2023.9 to v2023.11, skipping 2023.10).

Debug log shows evidence of successful connection to device.

What version of Home Assistant Core has the issue?

core-2023.11.2

What was the last working version of Home Assistant Core?

core-2023.9 (or closest build)

What type of installation are you running?

Home Assistant Container

Integration causing the issue

tplink

Link to integration documentation on our website

https://www.home-assistant.io/integrations/tplink

Diagnostics information

home-assistant_tplink_2023-11-14T17-58-32.092Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Nothing abnormal was found.

Additional information

home-assistant[bot] commented 11 months ago

Hey there @rytilahti, @thegardenmonkey, mind taking a look at this issue as it has been labeled with an integration (tplink) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `tplink` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign tplink` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


tplink documentation tplink source (message by IssueLinks)

joshua-nord commented 11 months ago

I experienced the same symptoms on a previously working setup when I upgraded to Home-Assistant v2023.11. I worked around it by allowing traffic from the HomeAssistant controller to the devices on ports 9999 and 20002 UDP and TCP., which appear to be Kasa's discovery ports.

sweharris commented 11 months ago

Just to note that I'm also seeing identical issues (as noted in a comment on #99449); as I reported there I'm seeing get_sysinfo calls working and seeing UDP traffic flowing fine.

My setup also has the devices on a separate VLAN to HomeAssistant, but the firewall rules permit HA to talk to the IoT VLAN without restriction.

JackBeQuick87 commented 11 months ago

Looking through the commit history involving the tplink integration, there was a recent commit that changed to using a newer version of the python-kasa package. I found a related issue in that repository: python-kasa/python-kasa#543.

The problem in Home-Assistant seems to be rooted in the python-kasa package.

As a workaround, you can re-add the devices using their IP addresses. This will obviously cause problems if you do not used fixed IPs or DHCP reservations.

A fix of the python-kasa library is still needed.

JackBeQuick87 commented 11 months ago

I experienced the same symptoms on a previously working setup when I upgraded to Home-Assistant v2023.11. I worked around it by allowing traffic from the HomeAssistant controller to the devices on ports 9999 and 20002 UDP and TCP., which appear to be Kasa's discovery ports.

Thanks, but this problem seems to be different. My HomeAssistant instance can already/still reach the Kasa devices on those ports.

sweharris commented 11 months ago

As a workaround, you can re-add the devices using their IP addresses. This will obviously cause problems if you do not used fixed IPs or DHCP reservations.

This seems to be a good workaround.

In my case I stopped the service, edited .storage/core.config_entries and replaced the DNS names with IP addresses, and then restarted. The devices are now properly found.

Thanks!

JackBeQuick87 commented 11 months ago

As a workaround, you can re-add the devices using their IP addresses. This will obviously cause problems if you do not used fixed IPs or DHCP reservations.

This seems to be a good workaround.

In my case I stopped the service, edited .storage/core.config_entries and replaced the DNS names with IP addresses, and then restarted. The devices are now properly found.

Thanks!

Ah, thank you so much for that tip with the core.config_entries file!

If, in the GUI, you try to add a new device using an IP of a device that was formerly registered using its hostname, then it seems the HA core or integration will automatically replace the device's host name with the newly entered IP. However, I don't know if this process would work in reverse, whenever the hostname issue is fixed. Using your method of editing the core.config_entries file will be convenient for the switch back to hostnames, too.

TheCodeJanitor-dotcom commented 11 months ago

I started experiencing this also. I have four TP-Link devices, all configured with static IP addresses on a subnet. By entering the host IP when adding the devices to the integration, they work, for a time ranging from minutes to hours, then stop working with the "Failed setup" indication. They always work from the Kasa app. I examined core.config_entries, and found the host address for all four devices were set to one of my routers' address. I edited the file, replacing the the host addresses with the correct static addresses, and restarted HA. All devices worked again, but eventually failed over the next few hours. The host addresses in core.config_entries had reverted to that same router address. I assume this is a function of the auto-discovery mechanism, but I don't know enough to be sure, or how to disable it.

latteetanne commented 11 months ago

I'm having this same behaviour. Can ping and use all three of them via official app but Home Assistant discovery can't find any of them via auto discovery and gives out error "No devices found on the network". If I try to enter IP when adding manually I get "Failed to connect" error. Prior to deleting and trying to re-add them I also checked that in core.config_entries they had same IP addresses as they currently do so it's weird. They used to work really reliably until now. :(

TheCodeJanitor-dotcom commented 11 months ago

I started experiencing this also. I have four TP-Link devices, all configured with static IP addresses on a subnet. By entering the host IP when adding the devices to the integration, they work, for a time ranging from minutes to hours, then stop working with the "Failed setup" indication. They always work from the Kasa app. I examined core.config_entries, and found the host address for all four devices were set to one of my routers' address. I edited the file, replacing the the host addresses with the correct static addresses, and restarted HA. All devices worked again, but eventually failed over the next few hours. The host addresses in core.config_entries had reverted to that same router address. I assume this is a function of the auto-discovery mechanism, but I don't know enough to be sure, or how to disable it.

I found a workaround that may only be applicable in my particular subnet configuration: I used IP/MAC binding in my subnets' router to pin the TP-Link/Kasa device MAC addresses to the static IPs I had previously assigned them. This seems to prevent whatever discovery mechanism kept changing the IP addresses in core.config_entries, and I haven't had a recurrence of 'Failed setup' (yet).

TheCodeJanitor-dotcom commented 11 months ago

I found a workaround that may only be applicable in my particular subnet configuration: I used IP/MAC binding in my subnets' router to pin the TP-Link/Kasa device MAC addresses to the static IPs I had previously assigned them. This seems to prevent whatever discovery mechanism kept changing the IP addresses in core.config_entries, and I haven't had a recurrence of 'Failed setup' (yet).

Forget it. Failed. Same result, the IP of the device reverts to the WAN address of the subnet router.

TermiNaderTL commented 11 months ago

Same issue for me

seancrites commented 11 months ago

I too run with a vlan/firewalled app & iot networks. Also have been having the same issue where my TP-LINK (HS105?) switch suddenly stopped working within the past month or two. I've also got DHCP reservation giving predictable IP addresses to the TP-LINK as well.

My old firewall was just permitting TCP 9999 from HA OS -> IOT. I enabled logging to see what was not permitted between the two and found that HA OS is now sending traffic via UDP on 9999 & 200002. I updated my firewall and that fixed the issue.

TheCodeJanitor-dotcom commented 11 months ago

I too run with a vlan/firewalled app & iot networks. Also have been having the same issue where my TP-LINK (HS105?) switch suddenly stopped working within the past month or two. I've also got DHCP reservation giving predictable IP addresses to the TP-LINK as well.

My old firewall was just permitting TCP 9999 from HA OS -> IOT. I enabled logging to see what was not permitted between the two and found that HA OS is now sending traffic via UDP on 9999 & 200002. I updated my firewall and that fixed the issue.

I disabled SPI on my routers, and got a different behavior, but discovered something interesting: my IOT router is also TP-Link, and the router is responding to the probes from HA OS, but obviously not in the way that the TP-Link integration is expecting.

So, all four of my TP-Link outlets (a KP303, an HS300, an HS107, and an EP40) get tried, but a failure for each is reported from the TP-Link routers' IP address...

It also appears that whatever fix was submitted above has stalled...

iointerrupt commented 10 months ago

It appears to be dns related issue in how python kasa is parsing hostname. For example, running

kasa --host bedroomlamp.lan on fails with: No --type defined, discovering.. Got error: SmartDeviceException('Unable to get discovery response for bedroomlamp.lan').

Whereas specifying the ip address of bedroomlamp.lan works fine: kasa --host 1.2.3.4 on

[EDIT]: For now, I have reverted my python-kasa install back to 0.5.3 and its working as expected. Looks like something in 0.5.4

trustno1foxm commented 10 months ago

I too run with a vlan/firewalled app & iot networks. Also have been having the same issue where my TP-LINK (HS105?) switch suddenly stopped working within the past month or two. I've also got DHCP reservation giving predictable IP addresses to the TP-LINK as well.

My old firewall was just permitting TCP 9999 from HA OS -> IOT. I enabled logging to see what was not permitted between the two and found that HA OS is now sending traffic via UDP on 9999 & 200002. I updated my firewall and that fixed the issue.

oh that was my bugfixing! thank you! nice that there hasn't been any documentation about that...

Shredder5262 commented 10 months ago

I believe my issue is related to this also. I see the error message below in the logs and since updating to HA core 2023.12.2 (I'm on 2023.12.3 now) I have been experiencing either large delays with my devices turning on or not turning on at all.

Logger: homeassistant.components.tplink.coordinator Source: helpers/update_coordinator.py:332 Integration: TP-Link Kasa Smart (documentation, issues) First occurred: December 14, 2023 at 7:32:59 PM (218 occurrences) Last logged: 10:15:51 AM

Error fetching 192.168.1.30 data: Unable to query the device 192.168.1.30:9999: [Errno 104] Connection reset by peer Error fetching 192.168.1.195 data: Unable to query the device 192.168.1.195:9999: [Errno 104] Connection reset by peer Error fetching 192.168.1.148 data: Unable to query the device 192.168.1.148:9999: Error fetching 192.168.1.30 data: Unable to connect to the device: 192.168.1.30:9999: [Errno 104] Connect call failed ('192.168.1.30', 9999) Error fetching 192.168.1.30 data: Unable to query the device 192.168.1.30:9999:

mmccool commented 10 months ago

Two things to add here:

Shredder5262 commented 10 months ago

I've only had 1 occurance of a device outright failing to connect and instead showing offline; else the devices seem to respond as they should. The error message itself seems to be a lot of noise that I'm trying to reduce in my environment.

tomlyo commented 10 months ago

I have this happen on all my Kasa devices whenever my WiFi goes out temporarily. As soon as it's back up all of them are in a state of "Failed setup, will retry". I've restarted Home Assistant in each case, and the OS itself (running HassOS). The devices work fine in the Kasa app, and they work in Home Assistant IF I remove the device, then re-add it by it's static IP address.

In my case, the switches are on a different VLAN than my HASS server. It's a bit frustrating, but every time we get a power outage, or my Wifi AP goes offline, I just have to go through deleting all my Kasa devices, and re-adding them. Also discovery is just wack, not sure what it's doing as I get logs like this:

2023-12-20 19:37:33.242 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.100', 9999) >> {'system': {'get_sysinfo': None}}
2023-12-20 19:37:33.246 DEBUG (MainThread) [kasa.discover] Waiting a total of 10 seconds for responses...
2023-12-20 19:37:39.826 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.101', 9999) >> {'system': {'get_sysinfo': None}}
2023-12-20 19:37:39.829 DEBUG (MainThread) [kasa.discover] Waiting a total of 10 seconds for responses...
2023-12-20 19:37:40.803 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.109', 9999) >> {'system': {'get_sysinfo': None}}
2023-12-20 19:37:40.807 DEBUG (MainThread) [kasa.discover] Waiting a total of 10 seconds for responses...
2023-12-20 19:37:48.649 DEBUG (MainThread) [kasa.discover] [DISCOVERY] ('192.168.68.100', 9999) >> {'system': {'get_sysinfo': None}}

which those IP's are just completely random as my network and vlans use nothing even close to that..

EDIT:

Re-reading the OP, "Unable to get discovery response for [device's hostname]." In my case this would make sense why the devices aren't connecting because it's trying completely random IP addresses, that aren't even the IP addresses I specified when I set the devices up in the integration.

In my case my switches use IP address 192.168.1.70-79

joeidea commented 10 months ago

I've had similar problems with the Kasa HS300 smart power strip. I first noticed at 2023.11.2. After reading here, I re-did the integration specifying IP (I use DHCP IP address reservation); that worked, for a while. It broke again about 2 weeks ago; but after about 2 days resolved by itself. I am now at Core 2023.12.3.

It broke again about 2 days ago. The behavior is different. I re-did the integration, specifying IP, but discovery fails. I then leave the Host blank, and discovery works, shows the correct IP address; but no entities are discovered, and I see endless failed/retry loop.

I will watch closely this week. I believe the trigger is my router rebooting (which I do weekly for maintenance.).

I resolved (temporarily?) doing this:

  1. Shutdown HA VM (Windows 11 Hyper-V)
  2. Change DHCP IP (192.168.1xxx)
  3. Reboot Router
  4. Start HA VM.
ecroskery commented 10 months ago

Went from HASS 2023.8 to 2024.1 today and after the upgrade the Kasa devices all failed to initialize and I found this thread.

My Kasa devices are also on a different VLAN from HASS and I was previously allowing HASS to connect to the Kasa devices via TCP on port 9999

With the information here I added a rule to allow HASS to also connect via UDP on port 20002 (* not 200002) and everything seems to be working again just fine.

Thanks very much for the information to get this working again

paulbraren commented 9 months ago

I have Home Assistant with Terminal installed. Not sure of the exact command for allowing discovery via UDP port 20002, but I'm curious, if I get that figured out, and add my 4 EP25 devices seen by Home Assistant, do you think those devices will persist through future updates? Eventually hoping Home Assistant incorporates a fix for these latest rev. EP25 devices not being discoverable, but I have no idea how long that may take. I'm admittedly rather new to the Home Assistant community. 2024-01-09_17-45-44

rytilahti commented 9 months ago

@paulbraren that's a different issue as what is being discussed here, sorry.

Anyway, when #105143 gets merged, the UDP connectivity will only be required for the initial discovery and for potential firmware updates that change the low-level communication parameters (different encryption etc.) which are only available using the discovery protocol. All device communications are done using TCP.

In practice, this means that as long as IP addresses remain stable or homeassistant is in the same network and can use DHCP traffic to update the addresses, the integration should keep working without any issues even on less favorable network conditions.

paulbraren commented 9 months ago

Thank you for the polite redirect to https://github.com/home-assistant/core/pull/105143 that I've been watching, much appreciated. I have a DHCP reservations for my 4 new EP25 devices, so the IP addresses won't change. It's good to hear UDP is only used initially for discovery, thanks for clarifying. Looking forward to finally being able to pair my Lutron Pico remotes with my 4 new EP25 devices, especially since that works so well with my older EP25 units that were easily discovered by Home Assistant.

sdb9696 commented 8 months ago

This issue should be fixed in the latest HA release 2024.02

sdb9696 commented 8 months ago

@home-assistant close

TheCodeJanitor-dotcom commented 8 months ago

Thank you all for your efforts, it appears to be working correctly now!

TheCodeJanitor-dotcom commented 8 months ago

As before, the integration worked for a few minutes, but auto-discovery apparently continues to run and corrupts the configuration of my Kasa devices (all outlets/outlet strips). Possibly related to the presence of a TP-Link SR20 router on my network, as the error message reports "Unable to connect to the device: :9999"

rytilahti commented 8 months ago

Without any network traces or other information, it is really time-consuming to figure out what is going wrong and how to fix it. The way DHCP discovery works is that it listens for broadcasted DHCP requests, and uses the requested IP addresses within those requests and the MAC address of the packet to inform integrations about address changes. Perhaps the router is acting as a weird, non-standard confirming relay that sets its own IP address to those requests the homeassistant host is receiving?

If that'st the case, there are no easy real fixes that can be done at homeassistant's end. One stop-gap solution as long as your devices have static IP addresses is to disable the dhcp discovery completely. To my knowledge, this could be done by removing the default_config and configuring all wanted integrations listed on that page manually. This approach is not recommended, as any future changes to defaults will not be automatically applied, and it will most likely break unexpectedly at some point in the future.