rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
145 stars 34 forks source link

OTGW connection is lost #232

Closed ajongen closed 3 months ago

ajongen commented 8 months ago

Hi Robert,

I am running the latest OTGW version (0.10.2+50c3ed2), WiFi connection is Amazing, all works as expected. But still it sometimes happens (every few weeks, sometimes after a few months), that the OTGW connection is lost and information is no longer exchanged. I have a regular ping defined in Monit to the OTGW ip-address which warns me if it is no longer reachable. This morning it happened again. In otmonitor I see:

image

Showing that regular communication stopped at around 06:49:44 and only 2 incoming commands to set OT followed some minutes later. Looking at the reboot_log.txt I see:

image

At 06:15 there was a Watchdog restart and the next reboot was a hard reset done by me (RST button pushed on WEMOS D1 mini) at 07:33.

Looking at the code I would expect that the Watchdog would restart the WEMOS in cause of lost WiFi connection, but it does not seem to work in all cases.

Any ideas?

Thanks for all the good work!

Cheers, Armand

ajongen commented 7 months ago

Had the same issue today. Seems like it is occurring more often lately... :-(

rvdbreemen commented 5 months ago

Hi @ajongen

Did you look at the debug information in the UI by any chance and see what the WiFi quality is.

In the latest firmware I added a quality estimate.

Sudden changes in behavior of the OTGW are hardly ever a firmware issue, sometimes it's the esp dying on you or changes to the network environment (like new AP around you).

Let me know if you resolved this, or is it still an issue.

Robert

ajongen commented 4 months ago

Hi Robert,

Yes, as also stated in the original message, Wifi quality is amazing.

image

I am currently working on an external WatchDog based on an ESP01S that can be triggered via Nodered if MQTT message of OTGW indicates offline for a longer time and that triggers the RST pin of the Wemos (pulls it low for a short time) to RESET it. That should be enough fallback in case the ESP is dying on me again.

Will keep you posted.

Greetings, Armand

hapklaar commented 3 months ago

I have the same issue or similar issue. Don't mean to hijack this thread, but just mentioning as it could be helpful.

For me if I reboot my access point, all my wifi devices come back online. This includes many esp's. Not so for OTGW. I have to remove power then reconnect for it to respond again.

Could there be an issue with recovering from outage, or maybe can be improved upon?

rvdbreemen commented 3 months ago

@hapklaar recovery of the wifi should happen, if the wifi AP stays the same, password and all. The logic is to retry connection to wifi, and after a while goto AP mode (you can see the ESP doing that), so you can reconfigure it. If you then wait 4 minutes, the cycle we happen once more.

It would be helpful to capture logs, and share them just to see what happens.

rvdbreemen commented 3 months ago

Yes, as also stated in the original message, Wifi quality is amazing. I saw that later, sorry, I just missed it. So the wifi quality does not seem to be an issue. I am currently working on an external WatchDog based on an ESP01S that can be triggered via Nodered if MQTT message of OTGW indicates offline for a longer time and that triggers the RST pin of the Wemos (pulls it low for a short time) to RESET it. That should be enough fallback in case the ESP is dying on me again.

Hmmm, in older versions we had an external watchdog (with a tinyat), but it was removed as we saw no reboots using the external watchdog in the hardware versions. So we ended up in a hardware release (don't remember which one, to remove the watchdog, maybe @tjfsteele can explain when that happened).

If you get it working based on this, then let us know here. Always willing to learn.

0crap commented 3 months ago

Without changing anything it started here as well. Didn't change firmware, so not related to that.

2024-03-18 07:18:16 - reboot cause: Software/System restart (4) 2024-03-17 22:36:43 - reboot cause: Software/System restart (4) 2024-03-17 21:56:28 - reboot cause: Software/System restart (4) 2106-02-07 07:28:29 - reboot cause: Software/System restart (4) 2023-09-14 10:26:39 - reboot cause: Software/System restart (4)

I already changed the phone charger (which is the power supply for my OTGW) to a brand new one, same issues. Is it possible my WEMOS D1 MINI is dying slowly? It's from the Nodo-Shop from jan-2020.

Screenshot 2024-03-20 174351

ajongen commented 3 months ago

Yes, as also stated in the original message, Wifi quality is amazing. I saw that later, sorry, I just missed it. So the wifi quality does not seem to be an issue. I am currently working on an external WatchDog based on an ESP01S that can be triggered via Nodered if MQTT message of OTGW indicates offline for a longer time and that triggers the RST pin of the Wemos (pulls it low for a short time) to RESET it. That should be enough fallback in case the ESP is dying on me again.

Hmmm, in older versions we had an external watchdog (with a tinyat), but it was removed as we saw no reboots using the external watchdog in the hardware versions. So we ended up in a hardware release (don't remember which one, to remove the watchdog, maybe @tjfsteele can explain when that happened).

If you get it working based on this, then let us know here. Always willing to learn.

Murphy seems to be in play here.... Since my last post, things are running stable. Also my other test ESP8266 with external WatchDog (ESP01) has not been triggered. I am thinking all might be related to some router (Ziggo ConnectBox) issues that occurred earlier and are now solved. If things change I will report again in this topic.

rvdbreemen commented 3 months ago

I will close the topic for now.

0crap commented 3 months ago

In case Google search brings you here. I replaced my WEMOS D1 MINI module (AliExpress) and reboots are gone. The WEMOS module is placed into a socket to the main board. I think shady socket connections could also very well be a cause for my reboot issues. (Soldering to the main board would be better.)