home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.11k stars 29.79k forks source link

TP-Link and Tapo devices failed setup with connect call failed errors since update to 2024.7 #123181

Open stuartford opened 1 month ago

stuartford commented 1 month ago

The problem

Since updating to 2024.7 the TP-Link and Tapo integrations are a mess. Most devices are working, but each integration has a hardcore few that fail.

Errors from each integration attached. HA continually tries to reload these devices, but they just end up back in the "Needs attention" list. Reloading the devices manually has the same result.

TP-Link devices make up the majority of my smart home estate, so this is a VM rollback incident for me.

Has anyone else had this issue, and, if so, how did you resolve it?

Screenshot 2024-08-05 at 11 27 55 Screenshot 2024-08-05 at 11 28 02

What version of Home Assistant Core has the issue?

2024.7.4

What was the last working version of Home Assistant Core?

2024.6.2

What type of installation are you running?

Home Assistant OS

Integration causing the issue

TP-Link

Link to integration documentation on our website

https://www.home-assistant.io/integrations/tplink

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 1 month ago

Hey there @rytilahti, @bdraco, @sdb9696, mind taking a look at this issue as it has been labeled with an integration (tplink) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `tplink` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign tplink` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


tplink documentation tplink source (message by IssueLinks)

stuartford commented 1 month ago

Update: Rolled-back to 2024.6.2, but the problems persist, so it's not to do with the update.

I am still left with a raft of non-functional devices, however, so I would appreciate some help with these errors!

sdb9696 commented 1 month ago

It may be the devices have changed ip address. Have you tried using the integration discovery to see if new devices are discovered (leave host blank)

stuartford commented 1 month ago

I solved this by power-cycling the devices. I don't know why it was required for roughly 1/3rd of the devices, but that's what I did and now they are working again.

speakxj7 commented 1 month ago

all my devices use static IP assignment.

i've found that since the big update - it takes a while for them to settle down, that smells like a rate-limit to me, somehow. other thing i would say is that i added some newer ep25 devices to my setup (auth required), while most of my other hardware is older (not required).

i have a fairly 'large' count of devices relative to most, i expect. (also a factor for my rate-limit theory, when they all try to shuffle in at the same time following a HA reboot). I have roughly 30 devices tied to the integration.

if it is rate-limit, then perhaps a staggered spin up is in order? (powering down devices could also make the same timing-shift effect)

msmith000 commented 4 weeks ago

I am also having this issue - I cannot add new devices manually, and the integration reports that there are no new devices when I try the discovery. I am also having the same issue as @stuartford . Please note screenshots below:

IMG_5310 IMG_5309 IMG_5308 IMG_5307

Daguse commented 3 weeks ago

I'm having the same issue after update 8.2024.

Tried power cycling devices but most did not come back.

rytilahti commented 3 weeks ago

i've found that since the big update - it takes a while for them to settle down, that smells like a rate-limit to me, somehow. other thing i would say is that i added some newer ep25 devices to my setup (auth required), while most of my other hardware is older (not required).

i have a fairly 'large' count of devices relative to most, i expect. (also a factor for my rate-limit theory, when they all try to shuffle in at the same time following a HA reboot). I have roughly 30 devices tied to the integration.

Are you having this problem also with the older devices, or does it just concern ep25s? We are not currently fetching the firmware update status for older devices, so if that's the case, it may very well have something to do with the cloud throttling the requests.

Daguse commented 3 weeks ago

I'm not sure how to tell if a device is ep25. However, I believe they are older devices as I don't remember having to do any auth requesters.

rytilahti commented 3 weeks ago

If you go to the device page, it will tell you the model. Alternatively, downloading the diagnostics info will also have more information (like whether authentication was used).

Anyway, looks like this is affecting also non-auth models given the communications are tried over port 9999 which is only used for the older protocol. That being said, this error only means that the device is not responding for the query for some reason. This might be due to network configuration (i.e., firewall not allowing connections), firmware gotten stuck somehow, etc., and is rather hard to debug...

speakxj7 commented 3 weeks ago

i've found that since the big update - it takes a while for them to settle down, that smells like a rate-limit to me, somehow. other thing i would say is that i added some newer ep25 devices to my setup (auth required), while most of my other hardware is older (not required). i have a fairly 'large' count of devices relative to most, i expect. (also a factor for my rate-limit theory, when they all try to shuffle in at the same time following a HA reboot). I have roughly 30 devices tied to the integration.

Are you having this problem also with the older devices, or does it just concern ep25s? We are not currently fetching the firmware update status for older devices, so if that's the case, it may very well have something to do with the cloud throttling the requests.

i would say that it's statistically worse for the auth-required new ep25's (pretty much 100% flake out on cold start and also take the longest to settle down), but i definitely see the problem as a more general problem. as i speak (after rebooting my HA system for some updates) kl130, hs220, ep10, and old ep25 (no auth) - all in the connect call failed cycle. i expect in an hour or two that pretty much all my devices will be fine.

rytilahti commented 3 weeks ago

Could it be a network issue at your end? What you could do is to try installing python-kasa and run kasa discover to see if the devices are detected correctly. If that is working stable, we can possible rule out that.

Daguse commented 3 weeks ago

So to add some flavor, I did a rollback to 7.2024 and still had the same issue. To add to it, I'm also having an issue with Govee and IPP after updating. Something must have changed in the network.

DarthSonic commented 3 weeks ago

Same issue over and over again. Must power cycle my Tapo P100 and P110 each few days to keep them connected to HA.

geecy84 commented 3 weeks ago

Mine have been like this too since I think 2024.7. They’re just so unreliable now, it’s just pot luck if they’re online or not. Anyone know of any alternative integrations that will bring the Tapo P100s into HA?

rdperkins commented 1 week ago

I've had this problem with TP-Link devices, old and new, switches, plugs, for nearly a year. Why can Alexa see these devices and works with them with no problem and HA doesn't? Note that only 10 of my 35 TP-Link devices are having this problem, and 2-3 are truly offline by design (unplugged plugs for Xmas use). I'm also have a similar problem with WLED, Reolink and Roku. In the meantime I just installed a new HS200 switch and the integration asks for user credentials. I used the ones I use signing on to the Kasa app but they don't work. What credentials are they talking about?

carpii commented 1 week ago

I've had this problem with TP-Link devices, old and new, switches, plugs, for nearly a year. Why can Alexa see these devices and works with them with no problem and HA doesn't?

If you ssh into your HA host, can you ping the devices?

I've just solved a similar problem with 10 KP-115 plugs. It was a combination of the plugs locking up (refusing connections, even though I could ping them), and some routing issue between HA and the devices

Power cycling the plugs didnt help. I had to factory reset each plug, before it would start accepting connections again, and then reboot HA host to get the integration talking to them.

This tool was very helpful in interrogating the plugs outside of HA

https://github.com/softScheck/tplink-smartplug

rytilahti commented 1 week ago

The upcoming release will disable the firmware update information (#124930) which should improve stability as the devices do not connect to the cloud anymore to fetch that information.

@rdperkins

Why can Alexa see these devices and works with them with no problem and HA doesn't?

Maybe Alexa controls these devices over the cloud, and by doing so is not affected by any issues in the local controls?

I'm also have a similar problem with WLED, Reolink and Roku.

This sounds like a deeper problem in your network, to be honest. These devices are controlled by completely different protocols and thus should have no effect on tplink devices.

In the meantime I just installed a new HS200 switch and the integration asks for user credentials. I used the ones I use signing on to the Kasa app but they don't work. What credentials are they talking about?

They are the credentials that were used when you provisioned the device using the app.

@carpii you can also use python-kasa directly to control these devices from the console. It is the very same library that is used by the integration.