Closed nl0pvm closed 2 months ago
By the way
That single warning message is simply HA saying that a socket has connected. Not sure why they do this but they always do it. Perhaps it is in line with the warning they place in the log when a plugin is loaded that is not part of their core. It annoyed me at first but I'm sure it helps them to know when a 3rd party plugin is involved. The reason why it is a warning and not simply info is so it shows in the logs more clearly.
When you say crashes do you mean that the devices in HA stop responding. Is the ESPSomfy web gui still up. This message seems to indicate that the plugin is receiving an invalid socket payload. I will add some debug messages to see if we can see what is causing this for you.
I fixed the typo in the issue form Thanks!
v2.4.0 of the ESPSomfy-RTS-HA plugin now waits to make sure the ESPSomfy RTS device is available before attempting to set up the entities. In your logs from HA above this is what happens when the startup for an integration occurs and the backend device is not available. It will now wait until it has a response from your ESPSomfy RTS device before attempting to start the plugin. This is why reloading the plugin makes the device available again.
Sadly, there seems to be no way for an integration to know whether the device has already been initialized or not and there is no way to reinitialize the existing entities. So it will now wait until it sees a response from the ESPSomfy RTS before it attempts to initialize the plugin.
Okay this was weird, a single curtain was not longer available in HA. Therefore my script that changes all curtains crashed. However i could control the rest in HA. Thus I reloaded the plugin in homeassistant now all curtains are unavailable. The webinterface of my ESPSomfyRTS is unavailable, and the device is still pingable. I can't see the console (as it is powering the device) so the only thing I can do is a power cycle.
That is very weird. The really weird thing is that there seems to be a socket message being sent to the integration that contains no data payload that is not a ping or pong for the socket. At least that is what the log output above seems to indicate.
Had another crash, reloading the HA plugin made all entities unavailable. The ESP webinterface was also not available; had to powercyle the ESP to get everything operational. Is there a way i can create sensible logging for this issue?
Actually, if you monitor the Wifi strength in the HA plugin you will see the when the disconnect occurs. Are you running a Mesh network. If you are then I have determined that the esp32 wifi module is not very smart. It connects to the first channel that it sees in its scan for the SSID. So I set up a test scenario where I have a mesh node on the outer fringes.
As you can see from my monitoring there is a node close by but it still connects to a node far away. In the chart you can see that the ESP originally connects to the strong node but this happened only after I made changes to force it to connect to the stronger node then if anything changes (node is rebooted, channel is changed, ... etc) and it reconnects, it drops back to the ridiculous node.
I have some code that I will post this weekend that forces the ESP32 to roam properly as the ESP32 WiFi libraries are severely lacking. I cannot believe that this is so poor and unless you are looking for the issue you will never assume that it is doing what it does.
Please install the most current pre-release of v2.4.1. I have added roaming capabilities to the wifi connection. This will monitor the network for stronger APs and connect to them dynamically as they become available.
I have updated today to the pre-release of v2.4.1 will keep you posted.
PS. I have marked the 3 "historical crash moments" in my wifi strength graph. That suggests that for me the wifi roaming issue might not be the root cause of my crashes.
Wow those are some really wide swings for RSSI. From a comfortable -40dbm all the way down to -90dbm.
How has it been since you installed 2.4.1? Is it more constant on the channel?
Nope, not more constant, it stays as noisy. However you triggered me and in my Deco m5 Wifi Mesh settings i have disabled mesh and forced 2.4 GHz... lets see what that brings..
Is there something in your environment that is creating noise? If it was steady at -90 it would likely be stable but given the measurable output you can expect that there are complete dropouts that aren't even being registered. Also, v2.4.1 has been released so update to the release version of v2.4.1.
Regarding the noise there are multiple known noise generators (in order of expected impact);
yep I have updated to v2.4.1, of course ;)
I assume those are wifi cameras so I wouldn't expect them to impact the link quality. The LED spots would only be an issue very close to the LED driver. I assume they too are wifi so that should not have an impact on the link quality. Then there is Zigbee and bluetooth. While Blue teeth are nasty to look at, they commonly do not create interference with WiFi on channels 1, 6, or 11. Zigbee on the other hand can be a menace. You can shut down your Zigbee devices for a bit and see if the link quality improves, if it does there are things you can do using channel selection to mitigate the issues.
Here are some other things to try. First try using narrow channel banding and switch from 40/80 to 20 if the TPLink supports it.
Shut down QoS on the network to see if it is beating up the radio, If you have one, use a WiFi analyzer to see what is out there and pin the WiFi channel to a less noisy channel.
Finally, the WiFi radio on the ESP32 may be faulty and this may not be noise at all and just a simply unmitigated crappy part on the board. Swings from -40 to over -100 will definitely drop the connection. You are likely not seeing the bottom of the link quality since losing connection will not result in any values and there will be nothing to report.
EDIT: I recently did a build with an ESP32s3 that uses an external WiFi antenna. The connectivity on it is striking although the build is a bit of a tinker. https://github.com/rstrouse/ESPSomfy-RTS/wiki/Mini-ESP32s3-Device-Build
@nl0pvm how has your network stability been since you upgraded? At this time I also recommend you update to the pre-release v2.4.2 as I have further optimized the http responses. This dramatically reduces the memory footprint into smaller chunks while the device is communicating over wifi.
EDIT: I recently did a build with an ESP32s3 that uses an external WiFi antenna. The connectivity on it is striking although the build is a bit of a tinker. https://github.com/rstrouse/ESPSomfy-RTS/wiki/Mini-ESP32s3-Device-Build
That build is very neat and thanks for your headsup.. i have many external antenna's lying around so i will go for that option.
@nl0pvm how has your network stability been since you upgraded?
no luck there.. Again i had a crash at the 10th of april.
At this time I also recommend you update to the pre-release v2.4.2 as I have further optimized the http responses. This dramatically reduces the memory footprint into smaller chunks while the device is communicating over wifi.
A reboot was required to show other update options than "main". But i updated. After the update i needed to restore from backup as all shades were gone.
Ok so I am pretty sure I have found the cause of the stability issue. As suspected it was networking related but ultimately came down to a small memory leak when sending responses for SSDP/UDP. Please update to the official release of v2.4.2
Okay here it comes.. it seems that since the update to the official release of v2.4.2 some build parameters have been changed that require the EPS32 to be connected to a pc in order to be able to boot. I will try to downgrade back to an older version when i have figured out how to do so.
Okay here it comes.. it seems that since the update to the official release of v2.4.2 some build parameters have been changed that require the EPS32 to be connected to a pc in order to be able to boot.
What do you mean? That is not true. Which ESP32 do you have?
What do you mean?
Before the update to 2.4.2 i could boot it on any usb power supply, after the update it suddenly does not boot anymore on any of my usb chargers but 1 1 specific one AND my laptop and my pc. So that's really weird. (it looks like this issue.. but.. I don't have an ESP32-S3!!
That is not true.
ok :)
Which ESP32 do you have?
ESP-WROOM-32
The wroom/wrover builds have not changed. That is very odd. I have never had an issue with specific power supplies. The issue you mentioned does have one thing in common though. Both solutions are using a peripheral attached to the 3v3 and GND. Interestingly a data cable can sometimes provide the ground through the data pins. Double check to ensure the ground and 3v3 are solid between the transceiver and the esp32.
@nl0pvm is there any update to your random disconnects? I have done a deep dive into the ESP32 core and as a result bumped the Arduino core to 2.0.16 from 2.0.14. There were some PRs related to network stability as well as some weirdness with the USB RX and TX pins. Update to the v2.4.3 pre-release if you are still having issues.
@nl0pvm I just released v2.4.3 that contains the library updates as well as a myriad of network fixes. Open a new issue and reference this one if you are still having issues.
Hardware
ESP32
Firmware version
v2.3.3
Application version
v2.3.3
What happened? What did you expect to happen?
I have 29 curtains that I operate automatically using a script. The curtains are sequentially operated so ESPSomfyRTS is not overloaded with commands. In the past 3 weeks that i am running ESPSomfyRTS now i have had on average 2 crashes per week. As far as my memory goes these all occured during a script that was operating all of those 29 curtains. Hence the "semi-random".
How to reproduce it (setp by step)
Logs