rstrouse / ESPSomfy-RTS

A controller for Somfy RTS shades and blinds
The Unlicense
517 stars 35 forks source link

Hardware goes offline every few days must power cycle to recover #46

Closed FlaMike closed 1 year ago

FlaMike commented 1 year ago

Well, it's been a few days since I bothered you, so I guess I'm tardy in reporting an issue to you.

Ever since I installed my ESPSomfy-RTS hardware & got everything going, the controller goes offline after a few days & does not recover unless I power cycle it. It then comes back online & works flawlessly for several more days. I have not been able to find a cause for this behavior. I have tried associating the controller with 3 different APs (moving it around the house so it has a strong signal to the AP). It's always been able to control all of my shades whenever it's online, but...

It went offline most recently about 90 minutes ago. I moved it to a another new location very close to an AP and it has a WiFi RSSI of -52 dBm. It's paired currently to a Unifi AC-Pro. Previously it was paired with a Unifi Nano (RSSI was in the mid -50 dBm range) and another AC-Pro (weak signal of around -85 dBm, but it was online for 5 days or so until a few minutes ago). As far as I can tell, none of the other devices on these APs have been going offline. Those that occasionally do go offline are able to re-pair to an AP without intervention by me.

Each AP is assigned its own WiFi channels for 2.4 & 5 GHz. I do run a Zigbee network, but it's on channel 25 and the 2.4 GHz WiFi channels I run are 1, 6, and 11 so they should not interfere with each other very much.

I have around 70 WiFi devices in the house. 5 running ESPHome, around 55 running Tasmota, and a variety of Android devices. In addition to the commercial switches and all that were flashed with Tasmota and ESPHome, there is a mixture of ESP32s and 8266s embedded in devices I built.

All or nearly all of the WiFi clients have fixed IP addresses assigned by the router.

Have you or others experienced the controller going offline & not recovering on its own? As far as I can tell, even when offline, the controller has power (the LED is illuminated on the ESP32). I am not aware of an power interruptions that occur when it goes offline, and we have a whole house generator that kicks in within 30-60 seconds a loss of power. The generator has not been roused from its sleep in months.

My controller and the transceiver are soldered to a PCB; it's as close a clone of your build in the hardware instructions that I could create. Same case from Amazon, & the larger antenna you recommended. I am not aware of any loose connections in the build and the unit is only touched when I move it to a different location. Two of its installation spots have been inside cabinets & one was on a high window ledge.

I am totally flummoxed & hoping you can help me get to the bottom of the problem.

As always, I appreciate your time, help, & patience!

FlaMike commented 1 year ago

Oh, I am running 1.5.3 of the controller and 0.4.4 of the HA integration. When the controller is offline, it is not accessible via HA or the controller's web URL.

rstrouse commented 1 year ago

When it goes offline does it start the AP? The reason I asked is because if the software detects a loss of connection it will start the AP. Did you reserve the IP in your DHCP server?

FlaMike commented 1 year ago

Actually, the way I found out it was offline today was I tried to go directly to the IP of the Somfy controller. I just double-checked and its IP is "fixed" (reserved) on the router. As shown here:

image

Normally I can access the controller directly or I can control the devices via the HA integration. When I wasn't able to reach the controller via its AP, I looked at HA & saw all of the shades were unavailable.

That was when I power cycled the Somfy controller & was able to access it by its AP and control the shades again in HA.

Here's a screenshot of the controller I took moments ago:

image

And here are the master BR shades in HA, again just taken as I write this response:

image

This is why I'm so puzzled about the controller's behavior.

rstrouse commented 1 year ago

No I mean look in the available Wi-Fi networks when it is not available. If it cannot find your network it will open its own network.

FlaMike commented 1 year ago

Ah, as I sent my note I thought that might be what you meant. I will have to check for its network the next time it poops out. I assume I will be able to recognize a new/different network with my phone & I can just join the network & then access the device(?).

rstrouse commented 1 year ago

Yes

FlaMike commented 1 year ago

OK. I will let you know what's happening the next time it goes offline. I just re-read the Configuring the Software section in the wiki & assume the same SSID will appear & I can go to 192.168.4.1 to get to the device, just as if it's fresh out of the box from an initial firmware installation If there is any data I should collect or things I should look for, please let me know & I will mine whatever data I can.

Thanks & have a super evening!

rstrouse commented 1 year ago

Yes the same SSID And ip address will be assigned.

As a side note none of the ones that are near me have ever gone offline. Just to be sure double check the number of external devices connected to the ESPSomfy device. Unfortunately, ESP32 as of now only supports 5 tcp connections at a time.

In a classic Deadpool rundown I assume you have the Home Assistant integration running that is 1. Each browser that you have connected consumes 1 each. So if you have one open on your pc, one open on your phone, and one open on your ipad that makes 4. If you then also have MQTT connected then that makes 5. Unfortunately, Espressif forgot their bag in the taxi so there is a limited amount of ammo.

So if your router disconnects wifi clients so it can change channels then it will go back to the end of the movie and start counting bullets again. The first one it encounters will be the first one it reconnects. Not saying that is what is happening here I am just tossing around the spaghetti to see what remains stuck to the wall.

FlaMike commented 1 year ago

Wow! That's fascinating (and slightly depressing) information about maximum ESP connections. I had no idea. My wife & I each have HA companion on our phones--2 connections there all day & night. It's also on my tablet, but I tend to turn it off when I'm not using it/reading books, etc., like now, for example. HA does not run continuously on my PC, but certainly I browse there several times/day from my PC. Once in a blue moon, I'll go the ESPSomfy's web to check things from whatever device I'm on. I did not configure MQTT on the Somfy ESP, but it sounds like I'm darned close to the limit more often than I imagined.

Each AP in the house runs on a separate dedicated channel, for both 2.4 and 5 GHZ. Not running a mesh since all APs are hardwired. But as I move about the house, the phones & tablet may hop from one AP to another. The phones are 5 MHz, but the tablet & all IOT stuff runs at 2.4 MHz.

I also run Wireguard on my phone so I can access HA & all its glory when I'm not in WiFi range (lets me open & close garage doors, check cameras, & other occasionally useful things). Allegedly, the Wireguard connection is only active when WiFi isn't available, but who knows what lurks in the shadows? Is it sucking an ESP connection 24/7? I've been thinking about switching to Cloudlfare, but don't know if that would help this situation or not.

If the connection between ESPSomfy and HA adds to the count (perhaps in lieu of MQTT), then I'm in danger of hitting the wall frequently. :-(

The connection limit may be a contributor to this issue. Now to figure out how to reduce the number of connections. (The sound you're hearing is my head & chin being scratched.) Hmmmmmm...

rstrouse commented 1 year ago

Here is why that number is ok. Only one connection is consumed for HA no matter how many HA users you have. So if you and the wife, the dog, your 16 cats, and the 22 goats are all connected to HA then that is fine. That is a single TCP client. One bullet takes out multiple bad guys. It is when you open a connection via a web browser (and just leave it open). In your case there is no MQTT so it isn't snagging a TCP connection.

rstrouse commented 1 year ago

Btw why no mesh. Beamforming is magical and a wired backhaul is Shangri-La. I thought you were running AIMesh.

I did sign up for the Nabu-Casa stuff. It works really well. Not free but good value.

FlaMike commented 1 year ago

Good news, indeed! Connections via browsers are only temporal & I'm the only one foolish enough to go that route here. It will be interesting to see if the connection stays up for a while. I'll dive in via its AP if it's available.

This time around the ESP is powered with a wall wart that's plugged into a zigbee smart plug. Much easier to power cycle, if nothing else.

BTW, who told you about my goats?

FlaMike commented 1 year ago

I thought Unifi frowned on running a mesh if the APs are hardwired. I've never lost a connection due to strolling around the house as far as I can tell.

That being said, I am willing to revisit if it makes sense to do so. I am certainly not a networking guru!

There are certainly advantages of using Nabu Casa. Again, I can certainly give it a whirl.

rstrouse commented 1 year ago

Doesn't everybody have goats?

FlaMike commented 1 year ago

😂

On 5/12/23 6:26 PM, rstrouse wrote:

Doesn't everybody have goats?

— Reply to this email directly, view it on GitHub https://github.com/rstrouse/ESPSomfy-RTS/issues/46#issuecomment-1546429450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3NIBAJWFEELY65CFWLN5TXF3BLXANCNFSM6AAAAAAX6YB7AU. You are receiving this because you authored the thread.Message ID: @.***>

FlaMike commented 1 year ago

Went offline again last night while we were out. Just did a quick power-cycle so I could close the shades quickly as my bride was not happy being in a fish bowl.

Just popped the ESP and transceiver off the PCB & Dupont wired them back together. Of course, it's working at the moment. Hoping it stays online--if so, I'll make a new PCB--possible intermittent wiring fault? I'll get out the meter & check again but??? I thought it was OK, but the world may be telling me otherwise.

FlaMike commented 1 year ago

A couple of Monday morning updates for your enjoyment(?).

There appears to be no problem with the PCB implementation. The Dupont throwback worked for less than 12 hours before going offline--even worse than the PCB. Although the board's LED was illuminated indicating it was powered, the AP never materialized, so I could not access the board on its own network & had to power cycle the board to bring it back up.

So, I'm back to the PCB 'cause it just looks better (and is soldered). I updated to 1.5.4 a few minutes ago & am crossing my fingers. Current WiFi RSSI is -53 DBm.

If things go south again, and I fear they may, I'm thinking of switching to a POE board & just getting rid of the WiFi issue altogether. It looks like I can get the Lilygo board (and adapter) from the far east in about a month. I can also get the Olimex board for a few bucks more & possibly sooner.

From your hardware page, it looks like you may have implemented both POE solutions during development. Do you have a preference/recommendation between the two? I'm hoping if I go the POE route I can get out of your hair on this project & we can both be happier :-)

Thanks for your thoughts!

rstrouse commented 1 year ago

Yeah this has to be some weirdness with the wifi or power. The watchdog would kick in if the hardware allowed and the thing would reboot. I have a bunch of these on my desk and several in production. Never once have I had it lock up hard. If you had the serial connection hooked up you might be able to see what is in the serial console when it goes unavailable.

Does it appear in your router when it is MIA or do you see anything in the router logs that might indicate that it has been sent to seclusion?

FlaMike commented 1 year ago

At the router end, it just goes offline. I've not seen I've not seen anything other than, poof! It's no longer connected. Power cycling makes everyone happy again--except me.

Bad ESP board perhaps? I can flash another & grab the backup, if that makes sense.

FlaMike commented 1 year ago

I'm assuming that after I flash a new board, I can just restore the backup file & not have to configure the shades et al.

rstrouse commented 1 year ago

That is correct. The shade configuration will restore.

You will need to set up the initial connection first.

FlaMike commented 1 year ago

Thanks! About to flash another ESP & see if the board was bad, if not, I'll go the POE route.

isngofoz commented 1 year ago

Looks like I may have a similar problem? Two days in a row, lost connection with HA. Require power cycle. Did not open browser to ESPSomfy. Did not check with AP was present, will check next time. Screenshot 2023-05-19 165643 Screenshot 2023-05-19 165656

rstrouse commented 1 year ago

@isngofoz Just to be sure. Did you make a DHCP reservation for the ESP32 in your router so it does not drop its lease? Also, do you have 802.11b enabled on your router?

isngofoz commented 1 year ago

I may not have setup the DHCP manual IP address correctly, so I have tried again.

I think 802.11b is enabled? image

rstrouse commented 1 year ago

If you do not have the DHCP set up to reserve the address, your router can reassign a different address when the lease expires. It used to be that you would need to go around and fix the IP address for all local devices and range the DHCP server to start outside that range, but these days setting a reservation keeps the DHCP from doling out a fixed address to another device.

802.11b should be disabled unless you have an 802.11b device which is very unlikely these days.

https://techunwrapped.com/disable-802-11b-g-wifi-on-your-router-to-improve-wireless-speed/

isngofoz commented 1 year ago

Hi there, I was pretty certain that I have previously reserved the IP address correctly as the address is outside the DHCP auto assign range, although upon first checking the IP address appear to be labelled as Manual setting and bind to the MAC address. So I thought I have not set it correctly.

However after multiple attempts of setting up the IP addresses (including upgrading the firmware of the router), it seems to be a typical setup for Asus router to an ESP chip (I think). If I attempt to unbind the MAC address to the IP address, the IP address falls back into the auto assign range.

I have IP cameras on fixed IP addresses as well but they are labelled as Static setting (which is what I am used to).

Although after I upgraded the router firmware and did all the above, (touch wood) it has not lost its connection again.

rstrouse commented 1 year ago

Yeah setting a fixed IP address on a device can be a double edged sword. While the connection is made with the IP address it is only guaranteed when the device is connected. This means every potential static address will need to be reserved in the DHCP server. Most often this is done by range and every static IP device is managed individually.

When you bind the reservation using the MAC, the DHCP server will only give the address to a device with that MAC address. You can accomplish the task either way but MAC reservations centralize the entire operation at the DHCP server.

Given your symptoms above it really does look like there was an IP conflict on your network. It acquired the pages from the browser cache but refused to return any of the socket endpoints. The same will happen if there are too many clients for the ESP to handle.

FlaMike commented 1 year ago

I think it's been a couple of weeks+ since we talked about ESPSomfy-RTS going off line regularly & not recovering. I'm going to knock on a giant redwood near your domicile before I say anything, but the system has been solid as a rock for the past 2+ weeks. What changed? Very little, actually.

I moved the controller to a hiding place in our master closet. Why there? First, there's a Nano HD AP about 10-12' from the RTS controller, as the WiFi signal flies. Most of that distance is vertical height, with perhaps 4' of horizontal difference in location. Second, and perhaps most important, there are no other IOT devices in the closet. Other locations where I've placed the RTS controller had WiFi devices, Zigbee devices (on channel 25, but still...), BLE, and RF433 fan controllers nearby. And at least one location had audio amplifiers, PCs, subwoofers, & who knows what else in the same cabinet.

Since it did not appear the ESP had more than its limit of client connections on WiFi, I thought a "cleaner" neighborhood was worth a shot. And it appears, so far, to have been a successful experiment (at least until I report it back to you! Then the spirits on the nearby cemetery will avenge my heathen sacrilege). I run the APs on channels 1, 6, & 11, so they shouldn't be affected by Zigbee, but I know one must be careful with such things.

There! Now that I've come clean about my installation, I expect all heck to break loose. Still, I felt it was apropos to let you know what is & isn't happening. Hoping the solid performance of the RTS controller continues far into the future. I've also thrown caution to the wind & just upgraded to 1.6.1. Sometimes, ya gotta live life on the edge, right?

Thanks for all of your help & patience. Be well!

FlaMike commented 1 year ago

P.S., I'm still running the controller on the same ESP32 I've been using all along. Never flashed a different board.

If you want to close this issue, it's OK with me as long as the RTS spirits agree :-)

rstrouse commented 1 year ago

Thanks for letting me know