rstrouse / ESPSomfy-RTS

A controller for Somfy RTS shades and blinds
The Unlicense
442 stars 32 forks source link

Semi-random disconnects from HA, resolved by manual reload of HA-integration #270

Closed nl0pvm closed 2 months ago

nl0pvm commented 5 months ago

Hardware

ESP32

Firmware version

v2.3.3

Application version

v2.3.3

What happened? What did you expect to happen?

I have 29 curtains that I operate automatically using a script. The curtains are sequentially operated so ESPSomfyRTS is not overloaded with commands. In the past 3 weeks that i am running ESPSomfyRTS now i have had on average 2 crashes per week. As far as my memory goes these all occured during a script that was operating all of those 29 curtains. Hence the "semi-random".

How to reproduce it (setp by step)

1. Give 29 curtains sequentially with enough time in between for the curtain to move a move command (in 7 minutes time)
2. Hope that the issue occurs (twice per week) and the device becomes unavailable in HA (see the Many ERROR messages in the log below)
3. Go to HA -> Settings -> Devices and Services -> reload the ESPSomfyRTS integration (see the single WARNING (SyncWorker_4) [websocket] websocket connected message in the log below)
4. Enjoy life again

Logs

2024-02-11 09:04:31.434 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:31.993 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:32.538 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:33.087 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:33.640 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:34.188 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:34.738 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:35.288 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:35.837 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:36.389 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:36.937 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:37.487 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:38.037 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:38.592 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:39.137 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:39.687 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:40.238 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:40.786 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:41.336 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:41.885 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:42.438 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:42.985 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:43.535 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:44.085 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:44.641 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:45.185 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:45.735 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:46.287 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:46.835 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:47.385 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:47.935 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:48.485 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:49.036 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:49.585 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:50.135 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:50.690 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:51.237 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:51.785 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:52.336 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:52.889 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:53.439 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:53.987 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:54.533 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:55.087 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:55.637 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:56.187 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:56.737 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:57.286 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:57.836 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:58.386 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:58.936 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:04:59.489 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:00.035 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:00.585 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:01.135 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:01.686 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:02.234 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:02.785 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:03.334 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:03.884 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:04.435 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:04.985 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)
2024-02-11 09:05:05.538 ERROR (SyncWorker_7) [websocket] error from callback <bound method SocketListener.ws_onmessage of <SocketListener(Thread-16186, initial)>>: Expecting value: line 1 column 157 (char 156)

[AFTER integration RELOAD]
2024-02-11 09:13:23.044 WARNING (SyncWorker_4) [websocket] websocket connected
nl0pvm commented 5 months ago

By the way

  1. there is a typo in the bug form.. (setp by step)
  2. I love ESPsomfyRTS, i am very grateful for that.
rstrouse commented 5 months ago

That single warning message is simply HA saying that a socket has connected. Not sure why they do this but they always do it. Perhaps it is in line with the warning they place in the log when a plugin is loaded that is not part of their core. It annoyed me at first but I'm sure it helps them to know when a 3rd party plugin is involved. The reason why it is a warning and not simply info is so it shows in the logs more clearly.

When you say crashes do you mean that the devices in HA stop responding. Is the ESPSomfy web gui still up. This message seems to indicate that the plugin is receiving an invalid socket payload. I will add some debug messages to see if we can see what is causing this for you.

rstrouse commented 5 months ago

I fixed the typo in the issue form Thanks!

v2.4.0 of the ESPSomfy-RTS-HA plugin now waits to make sure the ESPSomfy RTS device is available before attempting to set up the entities. In your logs from HA above this is what happens when the startup for an integration occurs and the backend device is not available. It will now wait until it has a response from your ESPSomfy RTS device before attempting to start the plugin. This is why reloading the plugin makes the device available again.

Sadly, there seems to be no way for an integration to know whether the device has already been initialized or not and there is no way to reinitialize the existing entities. So it will now wait until it sees a response from the ESPSomfy RTS before it attempts to initialize the plugin.

nl0pvm commented 5 months ago

Okay this was weird, a single curtain was not longer available in HA. Therefore my script that changes all curtains crashed. However i could control the rest in HA. Thus I reloaded the plugin in homeassistant now all curtains are unavailable. The webinterface of my ESPSomfyRTS is unavailable, and the device is still pingable. I can't see the console (as it is powering the device) so the only thing I can do is a power cycle.

rstrouse commented 4 months ago

That is very weird. The really weird thing is that there seems to be a socket message being sent to the integration that contains no data payload that is not a ping or pong for the socket. At least that is what the log output above seems to indicate.

nl0pvm commented 4 months ago

Had another crash, reloading the HA plugin made all entities unavailable. The ESP webinterface was also not available; had to powercyle the ESP to get everything operational. Is there a way i can create sensible logging for this issue?

rstrouse commented 4 months ago

Actually, if you monitor the Wifi strength in the HA plugin you will see the when the disconnect occurs. Are you running a Mesh network. If you are then I have determined that the esp32 wifi module is not very smart. It connects to the first channel that it sees in its scan for the SSID. So I set up a test scenario where I have a mesh node on the outer fringes.

As you can see from my monitoring there is a node close by but it still connects to a node far away. In the chart you can see that the ESP originally connects to the strong node but this happened only after I made changes to force it to connect to the stronger node then if anything changes (node is rebooted, channel is changed, ... etc) and it reconnects, it drops back to the ridiculous node.

I have some code that I will post this weekend that forces the ESP32 to roam properly as the ESP32 WiFi libraries are severely lacking. I cannot believe that this is so poor and unless you are looking for the issue you will never assume that it is doing what it does.

image

rstrouse commented 4 months ago

Please install the most current pre-release of v2.4.1. I have added roaming capabilities to the wifi connection. This will monitor the network for stronger APs and connect to them dynamically as they become available.

nl0pvm commented 4 months ago

I have updated today to the pre-release of v2.4.1 will keep you posted.

PS. I have marked the 3 "historical crash moments" in my wifi strength graph. That suggests that for me the wifi roaming issue might not be the root cause of my crashes. image

rstrouse commented 3 months ago

Wow those are some really wide swings for RSSI. From a comfortable -40dbm all the way down to -90dbm.

How has it been since you installed 2.4.1? Is it more constant on the channel?

nl0pvm commented 3 months ago

Nope, not more constant, it stays as noisy. However you triggered me and in my Deco m5 Wifi Mesh settings i have disabled mesh and forced 2.4 GHz... lets see what that brings..

rstrouse commented 3 months ago

Is there something in your environment that is creating noise? If it was steady at -90 it would likely be stable but given the measurable output you can expect that there are complete dropouts that aren't even being registered. Also, v2.4.1 has been released so update to the release version of v2.4.1.

nl0pvm commented 3 months ago

Regarding the noise there are multiple known noise generators (in order of expected impact);

  1. Eufy video camera system
  2. Milight LED spots
  3. zigbee
  4. bluetooth devices
  5. Microwave

yep I have updated to v2.4.1, of course ;)

rstrouse commented 3 months ago

I assume those are wifi cameras so I wouldn't expect them to impact the link quality. The LED spots would only be an issue very close to the LED driver. I assume they too are wifi so that should not have an impact on the link quality. Then there is Zigbee and bluetooth. While Blue teeth are nasty to look at, they commonly do not create interference with WiFi on channels 1, 6, or 11. Zigbee on the other hand can be a menace. You can shut down your Zigbee devices for a bit and see if the link quality improves, if it does there are things you can do using channel selection to mitigate the issues.

Here are some other things to try. First try using narrow channel banding and switch from 40/80 to 20 if the TPLink supports it.
Shut down QoS on the network to see if it is beating up the radio, If you have one, use a WiFi analyzer to see what is out there and pin the WiFi channel to a less noisy channel.

Finally, the WiFi radio on the ESP32 may be faulty and this may not be noise at all and just a simply unmitigated crappy part on the board. Swings from -40 to over -100 will definitely drop the connection. You are likely not seeing the bottom of the link quality since losing connection will not result in any values and there will be nothing to report.

EDIT: I recently did a build with an ESP32s3 that uses an external WiFi antenna. The connectivity on it is striking although the build is a bit of a tinker. https://github.com/rstrouse/ESPSomfy-RTS/wiki/Mini-ESP32s3-Device-Build

rstrouse commented 3 months ago

@nl0pvm how has your network stability been since you upgraded? At this time I also recommend you update to the pre-release v2.4.2 as I have further optimized the http responses. This dramatically reduces the memory footprint into smaller chunks while the device is communicating over wifi.

nl0pvm commented 2 months ago

EDIT: I recently did a build with an ESP32s3 that uses an external WiFi antenna. The connectivity on it is striking although the build is a bit of a tinker. https://github.com/rstrouse/ESPSomfy-RTS/wiki/Mini-ESP32s3-Device-Build

That build is very neat and thanks for your headsup.. i have many external antenna's lying around so i will go for that option.

@nl0pvm how has your network stability been since you upgraded? image

no luck there.. Again i had a crash at the 10th of april.

At this time I also recommend you update to the pre-release v2.4.2 as I have further optimized the http responses. This dramatically reduces the memory footprint into smaller chunks while the device is communicating over wifi.

A reboot was required to show other update options than "main". But i updated. After the update i needed to restore from backup as all shades were gone.

rstrouse commented 2 months ago

Ok so I am pretty sure I have found the cause of the stability issue. As suspected it was networking related but ultimately came down to a small memory leak when sending responses for SSDP/UDP. Please update to the official release of v2.4.2

nl0pvm commented 2 months ago

Okay here it comes.. it seems that since the update to the official release of v2.4.2 some build parameters have been changed that require the EPS32 to be connected to a pc in order to be able to boot. I will try to downgrade back to an older version when i have figured out how to do so.

rstrouse commented 2 months ago

Okay here it comes.. it seems that since the update to the official release of v2.4.2 some build parameters have been changed that require the EPS32 to be connected to a pc in order to be able to boot.

What do you mean? That is not true. Which ESP32 do you have?

nl0pvm commented 2 months ago

What do you mean?

Before the update to 2.4.2 i could boot it on any usb power supply, after the update it suddenly does not boot anymore on any of my usb chargers but 1 1 specific one AND my laptop and my pc. So that's really weird. (it looks like this issue.. but.. I don't have an ESP32-S3!!

That is not true.

ok :)

Which ESP32 do you have?

ESP-WROOM-32

rstrouse commented 2 months ago

The wroom/wrover builds have not changed. That is very odd. I have never had an issue with specific power supplies. The issue you mentioned does have one thing in common though. Both solutions are using a peripheral attached to the 3v3 and GND. Interestingly a data cable can sometimes provide the ground through the data pins. Double check to ensure the ground and 3v3 are solid between the transceiver and the esp32.

rstrouse commented 2 months ago

@nl0pvm is there any update to your random disconnects? I have done a deep dive into the ESP32 core and as a result bumped the Arduino core to 2.0.16 from 2.0.14. There were some PRs related to network stability as well as some weirdness with the USB RX and TX pins. Update to the v2.4.3 pre-release if you are still having issues.

rstrouse commented 2 months ago

@nl0pvm I just released v2.4.3 that contains the library updates as well as a myriad of network fixes. Open a new issue and reference this one if you are still having issues.