home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.92k stars 29.58k forks source link

TP-Link P115 suddenly throws errors: ValueError: The length of the provided data is not a multiple of the block length. #122090

Closed frdmn closed 1 month ago

frdmn commented 1 month ago

The problem

Starting today, all of my P115 smart plugs stopped working with the following error in the logs:

homeassistant  | 2024-07-17 16:45:01.158 ERROR (MainThread) [homeassistant.config_entries] Error setting up entry Dehumidifier P115 for tplink
homeassistant  | Traceback (most recent call last):
homeassistant  |   File "/usr/src/homeassistant/homeassistant/config_entries.py", line 586, in async_setup
homeassistant  |     result = await component.async_setup_entry(hass, self)
homeassistant  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/src/homeassistant/homeassistant/components/tplink/__init__.py", line 165, in async_setup_entry
homeassistant  |     device: Device = await Device.connect(config=config)
homeassistant  |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/device.py", line 233, in connect
homeassistant  |     return await connect(host=host, config=config)  # type: ignore[arg-type]
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/device_factory.py", line 71, in connect
homeassistant  |     return await _connect(config, protocol)
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/device_factory.py", line 110, in _connect
homeassistant  |     await device.update()
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/smart/smartdevice.py", line 156, in update
homeassistant  |     await self._negotiate()
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/smart/smartdevice.py", line 132, in _negotiate
homeassistant  |     resp = await self.protocol.query(initial_query)
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 68, in query
homeassistant  |     return await self._query(request, retry_count)
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 73, in _query
homeassistant  |     return await self._execute_query(
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 202, in _execute_query
homeassistant  |     return await self._execute_multiple_query(request, retry_count)
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 145, in _execute_multiple_query
homeassistant  |     response_step = await self._transport.send(smart_request)
homeassistant  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/klaptransport.py", line 356, in send
homeassistant  |     decrypted_response = self._encryption_session.decrypt(response_data)
homeassistant  |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/kasa/klaptransport.py", line 491, in decrypt
homeassistant  |     dp = decryptor.update(msg[32:]) + decryptor.finalize()
homeassistant  |                                       ^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/cryptography/hazmat/primitives/ciphers/base.py", line 184, in finalize
homeassistant  |     data = self._ctx.finalize()
homeassistant  |            ^^^^^^^^^^^^^^^^^^^^
homeassistant  |   File "/usr/local/lib/python3.12/site-packages/cryptography/hazmat/backends/openssl/ciphers.py", line 223, in finalize
homeassistant  |     raise ValueError(
homeassistant  | ValueError: The length of the provided data is not a multiple of the block length.

I managed to fix it temporarily by removing all affected devices, rebooting them, then adding them back in HA - however, the problem occured again after just 30 minutes.

The devices were working fine for weeks and just today this problem started to appear (without any HA update).

Anyone has an idea how to fix this?

What version of Home Assistant Core has the issue?

core-2024.7.2

What was the last working version of Home Assistant Core?

core-2024.7.2

What type of installation are you running?

Home Assistant Container

Integration causing the issue

TP Link Tapo

Link to integration documentation on our website

https://www.home-assistant.io/integrations/tplink_tapo/#usage-information

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 1 month ago

tplink_tapo documentation tplink_tapo source

mediakuu commented 1 month ago

Exact same issue here with Tapo P115 (6 of them).

dani-rodr commented 1 month ago

The issue might be with TP Link Tapo App or their servers, it is being unresponsive on the TP Link Tapo app on my end Another thread suggested to block the internet access of those devices and it worked for me

mediakuu commented 1 month ago

The issue might be with TP Link Tapo itself, it is being unresponsive on the TP Link Tapo app. Another thread suggested to block the internet access of those devices

All the devices are working fine in Tapo App, haven't notice any issues.

kylroy1 commented 1 month ago

Happening to me too, issues started around 8:35 GMT many sockets disconnected. I was able to fix some of them but still having issues with three P110 and one P100.

kylroy1 commented 1 month ago

The issue might be with TP Link Tapo itself, it is being unresponsive on the TP Link Tapo app. Another thread suggested to block the internet access of those devices

Yes that helped. I blocked internet access and I was able to fix all but one P110.

kylroy1 commented 1 month ago

On the other hand all sensors related to power and energy reporting are gone from HA.

image

autoSteve commented 1 month ago

Identical log messages for me, too. Similar timeframe.

Same device firmware, same HA version. Tapo app appears to have no issue accessing all the devices.

All P110 plugs are getting this message. An L920 light strip is working fine from HA.

BeaverUI commented 1 month ago

Kilroy, your cloud connection is still active. Try to block the device from internet.

autoSteve commented 1 month ago

Power cycling my P110 plugs seemed to be bringing them back after a reload of each, but I am getting weird value / unavailable sensor readings, alternating every few seconds like something's possessed. The logbook for each sensor is going crazy with flapping.

(edit) The log is being spammed with the log message noted above, but at least the devices are kind-of working at times.

image image image
nicola-spreafico commented 1 month ago

Same problem for me. I have 4 P110 and around 14h ago started all to give this error:

2024-07-18 08:45:22.957 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Unexpected error fetching 192.168.50.245 data
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 312, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/tplink/coordinator.py", line 49, in _async_update_data
    await self.device.update(update_children=False)
  File "/usr/local/lib/python3.12/site-packages/kasa/smart/smartdevice.py", line 165, in update
    self._last_update = resp = await self.protocol.query(req)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 68, in query
    return await self._query(request, retry_count)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 73, in _query
    return await self._execute_query(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 202, in _execute_query
    return await self._execute_multiple_query(request, retry_count)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kasa/smartprotocol.py", line 145, in _execute_multiple_query
    response_step = await self._transport.send(smart_request)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kasa/klaptransport.py", line 356, in send
    decrypted_response = self._encryption_session.decrypt(response_data)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kasa/klaptransport.py", line 491, in decrypt
    dp = decryptor.update(msg[32:]) + decryptor.finalize()
                                      ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/cryptography/hazmat/primitives/ciphers/base.py", line 184, in finalize
    data = self._ctx.finalize()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/cryptography/hazmat/backends/openssl/ciphers.py", line 223, in finalize
    raise ValueError(
ValueError: The length of the provided data is not a multiple of the block length.

Tried to unplug/plug into socket but did not solve the issue. I now blocked the internet connection (from the FritzBox router) as suggested, after reloading the integration it seems the problem is now fixed.

kylroy1 commented 1 month ago

Kilroy, your cloud connection is still active. Try to block the device from internet.

It is enabled because I was trying to see if doing it would bring energy and power reporting back.

Disabling the connection allowed me to reconnect the sockets into HA but the reason I bought these socket model is to also track energy and power usage.

Sadly, those sensors are gone with or without internet connection.

kylroy1 commented 1 month ago

I was able to reconnect to HA the las P110 that was failing. Interestingly I was able to do it allowing it to connect to the internet AND I got back reporting of power and energy into HA but this is the price to pay:

The order I did it was a bit different but re-adding it into HA with a different IP was what solved all problems for me. It didn't even ask to re-enter credentials, it just worked. Maybe you can try changing the IP first before doing the full reset.

Now I must repeat all this dance with all the other sockets, super fun.

kylroy1 commented 1 month ago

Bad news, after reconfiguring the P110 switch I had to move it back to where it belongs and doing that required reconnecting the socket in a different place. After restarting, the switch function is back but the power and energy sensors are gone from HA. Same behaviour with or without internet access.

At least the switch function is still working.

autoSteve commented 1 month ago

I finally have things stable with Tapo/HA. But it's not good news...

I had to power cycle every smart plug to get HA to see them again. And then the logs were going nuts with the error above.

Then I created network blocks to prevent them from seeing the Internet, and much quietened down. With one exception...

One smart plug was showing "Unavailable" with a lack of Internet. So I power cycled it. No go. Then I allowed it Internet access, and it came back up and appears happy like that.

This is too weird.

So one out of thirteen smart plugs is happy as a clam with Internet, and the rest not. Is it the designated thirteenth? Unlucky for some, I guess...

I can't help thinking we're treating the symptom here, and not getting to the root cause of why around a day ago Tapo smart plugs mostly turned to custard from a Home Assistant POV... What happened???

autoSteve commented 1 month ago

@kylroy1, you don't have thirteen of these P110s do you, and this was the thirteenth? šŸ˜‚šŸ¤¦šŸ»

I believe that enabling Internet alone for that guy would have been enough, given my experience with my lucky 13 plug.

kylroy1 commented 1 month ago

@autoSteve HA! this is funny I have exactly 13 P110 but sadly this is not the problem :) Yesterday I had issues with almost all of them P100 and KP105.

Interestingly one of the P110s was still reporting power and energy back to HA but as soon as they are powercycled this stops working.

Even more, as I was not using this P110s I removed it from HA and added it again. The new device does not even have the power and energy sensors anymore.

In the logs I see this:

Logger: kasa.smart.smartdevice
Source: components/tplink/config_flow.py:403
First occurred: 17 July 2024 at 20:38:10 (4 occurrences)
Last logged: 15:38:46                       <-- i.e. a few minutes ago in my timezone.

Error processing Energy for device 192.168.xx.yy, module will be unavailable: get_energy_usage for Energy (error_code=JSON_DECODE_FAIL_ERROR)
Error processing Energy for device 192.168.xx.zz, module will be unavailable: get_energy_usage for Energy (error_code=JSON_DECODE_FAIL_ERROR)
Error processing Energy for device <dns-name>, module will be unavailable: get_energy_usage for Energy (error_code=JSON_DECODE_FAIL_ERROR)

Weird that for one of them it resolved the name associated with the IP. My internal DDNS servers are in sync with the DHCP servers so all IPs have the correct name in the reverse domain.

Right now I disabled internet access later when I have more time I'll some more testing but I agree with you that this is dealing with the symptom.

autoSteve commented 1 month ago

Thirteen. šŸ¤£ It's a thing... There be goblins 'n' demons in thy Tapos, @kylroy1 šŸ˜‚...

A root cause of all this for the win.

For now, I will back away slowly, 'secure' in the thought that 'kludges' are working for me. Well, at least stopping my logs getting to thirteen-bajillion bytes long per day.

Until tomorrow? Who knows? šŸ¤·ā€ā™‚ļø

Your DDNS/DHCP, and other local networking? Pfft. This is beyond our control. You run your network in your way, and I run my Ubiquiti UDM set up, with local BGP DNS. And we're both hitting the same problem. As are many others. And it happened at around the same time of day.

Like we all would dearly love to say while dating a dud, yet do not have the guts, "Sorry hon, it's you and not me..."

nicola-spreafico commented 1 month ago

While blocking the internet access seems to be solved the integration issue (the integration is now fine), The Power (W) value is now "unavailable" any time, even if the appliance is running.

Getting this error from the logs while trying to restart integration

2024-07-18 18:53:28.536 ERROR (MainThread) [kasa.smart.smartdevice] Error processing Energy for device 192.168.50.245, module will be unavailable: get_energy_usage for Energy (error_code=JSON_DECODE_FAIL_ERROR)
2024-07-18 18:53:28.539 WARNING (MainThread) [homeassistant.components.tplink.entity] Unable to read data for <DeviceType.Plug at 192.168.50.245 - Lavastoviglie  (P110)> None: get_latest_fw for Firmware (error_code=UNSPECIFIC_ERROR)
kylroy1 commented 1 month ago

Some progress dealing with the symptom:

  1. Blocked internet access to all sockets.
  2. Removed one of the sockets that I'm not using from both HA and the Tapo app.
  3. Reseted the socket keeping the button pressed until it start blinking
  4. Re-added the socket into the Tapo app. The app offers to reuse the old config, I rejected but I re-used the same name and IPs
  5. Re-added the socket into HA and all was back, in particular power and energy reporting.
  6. After power-cycling the socket power and energy reporting are gone.
  7. Removed the socket from HA (nothing changed in the APP)
  8. Re-Added the socket into HA and power and energy are now back.

So, provided I don't have a blackout things should be fine :P

Another thing I'm going to do is that for each switch I will create 3 group helpers, one for switch, one for the power and one for the energy. Then I will use these groups everywhere instead of the real entities.

This will make replacing sockets way easier and most important I will not lose usage history.

nicola-spreafico commented 1 month ago
Another thing I'm going to do is that for each switch I will create 3 group helpers, one for switch, one for the power and one for the energy. Then I will use these groups everywhere instead of the real entities.

This will make replacing sockets way easier and most important I will not lose usage history.

Isn't it enough to use the same entity_id if you need to change the physical sensor? You should be able to preserve all the historical data associated to that id

kylroy1 commented 1 month ago

@nicola-spreafico hmm yea that's right.

I was thinking on a method where I could replace one socket that is half broken with another that it's fixed without minimal disruption to the devices that are connected to it. Probably an overkill.

kylroy1 commented 1 month ago

Reloading the socket in the TP integration also brings back power and energy reporting to HA

image image image

autoSteve commented 1 month ago

Some progress dealing with the symptom

After going through that lot, @kylroy1, are your logs still getting spammed with ValueError: The length of the provided data is not a multiple of the block length.?

kylroy1 commented 1 month ago

@autoSteve I downloaded the log file and the first entry in the file has the timestamp 2024-07-17 17:58:02.765 so sadly it only goes back to Wed. For what I can see the errors have stopped, this is the number of errors per hour in my log file:

$ egrep -e '(^2024|The length of the provided data is not a multiple of the)' log.txt  |\
    egrep -B1 -e^Value | egrep ^2024 | cut -f1 -d: | sort  | uniq -c
      3 2024-07-17 17
     36 2024-07-17 18
    249 2024-07-17 19
    177 2024-07-17 20
    120 2024-07-17 21
    630 2024-07-17 22
    235 2024-07-17 23
    412 2024-07-18 00
    357 2024-07-18 01
    230 2024-07-18 02
    230 2024-07-18 03
    343 2024-07-18 04
    714 2024-07-18 05
    719 2024-07-18 06
    717 2024-07-18 07
    551 2024-07-18 08
     91 2024-07-18 13
      1 2024-07-18 19

The last error in my logs is from yesterday --> 2024-07-18 19:37:28.332

autoSteve commented 1 month ago

Nice @kylroy1. I "blunt instrument" restarted HA entirely. The log spamming stopped, but one device refuses to come back on line, whether it can get to the Internet or not.

I am philosophical about that one. Power consumption for one fridge... Whatever. It also throws the error, but just once on startup. Amusingly this is the same "lucky 13" device that was working when it could see Internet, which all others couldn't. It's turned itself into "unlucky 13" overnight...

I got nothin'.

Some localtuya users hit a similar sounding thing back in May. I can't find what the root of that issue was, as the fix seems to be buried somewhere in an unrelated commit. But of interest to me is that it was a change to the integration code that solved it for them.

This is cryptography related. Surely not something as simple as a cert expiring, or being revoked? That would explain each of us hitting it at the same time though...

kylroy1 commented 1 month ago

@nicola-spreafico nice, that explains. Since I blocked all access to internet and reconfigured them things are back to normal. Let's see how long the sockets survive without contacting the "mothership".

nicola-spreafico commented 1 month ago

This is the step I did: 1) Blocking internet access from the router firewall (Fritzbox router) 2) Entirely removed the plugs from Home Assistant 3) Unplug/Plug the plugs from the sockets 4) Configured again the device in HA

Right now the integration seems to be working again, sadly the Power entity is now missing... Is not even unavailable, is totally missing from the device entities image

From the app the Power reading is displayed, so I don't think it is a sensor failure image

kylroy1 commented 1 month ago

@nicola-spreafico what worked for me to bring the power and energy sensors back was:

  1. block the socket internet access
  2. remove the socket from the Tapo app (not need to remove them in HA)
  3. add the socket back into the Tapo app (no need to reset them from the button)
  4. reload the socket in HA from the integration (see my post above, the reload button is somewhat hard to find in the UI)

    after doing that all should be back to normal.

nicola-spreafico commented 1 month ago

@kylroy1 Yes, I was missing a part. I removed the device from the TAPO App and I configured it again (like you suggested), and I can see the difference: image

After pairing again with the TAPO app and reloading the HA integration, 3 more entities have been added to the device, including the Power reader that is the most important thing: image

tjorim commented 1 month ago

Duplicate of #122083.

frdmn commented 1 month ago

The 2024.7.3 update seems to have fixed the issue! šŸ„³

2fst4u commented 2 weeks ago

The 2024.7.3 update seems to have fixed the issue! šŸ„³

Disagree. I've been having this issue continuously. Trying to re-add my devices continues to give the same error.

nicola-spreafico commented 2 weeks ago

The 2024.7.3 update seems to have fixed the issue! šŸ„³

It was indeed solved (for me), until 2 days ago where it started showing the same behaviour. The internet access of the plugs is still disabled so it cannot be a firmware update, also I did not updated HA or else.

Right now the "power" sensor is currently missing for the plugs, I tried to delete and add it again and the power sensor is visibile but "unavailable" in any case