rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
145 stars 34 forks source link

Gateway stops working after a few days of not being used #240

Closed rotilho closed 3 months ago

rotilho commented 5 months ago

I'm facing a bug that takes a long time to reproduce. I'm using the OTGW in stand-alone mode where I control the setpoint based on multiple sensors.

I also have an automation that changes the setpoint of each room based on how far I'm from home to the minimum of 16 degrees. So when I travel, the house is normally set to 16 degrees, causing the boiler to not kick in for several days. After around 3 or 4 days, by the time the temperature is low enough, OTGW is not responsive anymore, requiring me to remotely restart it. There are no errors in the logs; it's just stuck.

Edit: hot water pre-heat is disabled.

rvdbreemen commented 5 months ago

Do you have any logs to share? If not could you try to capture logs?

hvxl commented 5 months ago

Which part is stuck? The Wemos or the PIC? Can you still access the web interface on the Wemos? Can you connect to port 25238? If so, do things start working again if you send a GW=R command?

rotilho commented 5 months ago

Sorry guys, I already restarted it and my place was very cold so I didn't spend too much time debugging. This is the fourth time it happened I suspected that upgrading to the latest version would fix it but this time I already had the lasted version.

The web interface was working, I connected to the serial port but nothing was happening. Any command to set temperature was accepted but after a few seconds it was going back to zero. Everything was reporting zero or no information.

Next time I'll try to collect more meaningful information before being on my way home.

JvHummel commented 4 months ago

I might have had a similar experience yesterday.

After months of stable operation, I restarted Home Assistant, which then wasn't able to connect over the socket anymore. The web page was still up, but not displaying any info about the OTGW status anymore. OTGW itself was working, communication between boiler and thermostat still worked. I was able to reboot using the web page, after which functionality was restored. Logs showed no new entries since the previous reboot.

I'm using the latest ESP firmware 0.10.2 and PIC firmware 6.5.

dwar commented 4 months ago

Since I recent started to push the outside temperature, i have the same problem seen twice (empty UI and no connection with HA, even after reload of the otgw socket plugin)

On the empty page there was the message: "PS=1 mode; No UI updates."

How can i next time obtain useful debug info?

rvdbreemen commented 4 months ago

Hi @dwar

That message means you are using the serial connections somehow and put the OTGW in PS=1 mode. That mode prevents the web UI to get updates.

If you want this to work I recommend using the MQTT integration for OTGW using Home Assistant.

Check out the wiki on how to setup MQTT with Auto Discovery:

https://github.com/rvdbreemen/OTGW-firmware/wiki/How-to-setup-another-OTGW-using-the-WebUI

rotilho commented 4 months ago

Okay guys. Few days without using it and the problem appeared again.

Here, everything I collected:

Trying 192.168.2.40...
Connected to 192.168.2.40.
Escape character is '^]'.
18:04:07.363651 (  12392| 11880) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:07.502457 (  13928| 12528) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:07.639094 (  13928| 12528) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:07.784869 (  14600| 12528) handleOTGW  (1771): Net2Ser: Sending to OTGW: [PR=I] (4)
18:04:08.014900 (  11800| 11232) checkOTGWcmd(1293): CmdQueue: Checking if command is in in queue [PR: I=11] (8)
18:04:08.636180 (  13144| 12664) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:09.633980 (  13472| 12368) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:10.631923 (  13144| 12144) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:11.630608 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:12.627204 (  14480| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:13.625744 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:14.624155 (  15400| 14504) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:15.621442 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:16.619026 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:17.617316 (  15400| 14504) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:18.163610 (  14400| 13688) handleOTGW  (1771): Net2Ser: Sending to OTGW: [PR=I] (4)
18:04:18.178821 (  11792| 11096) checkOTGWcmd(1293): CmdQueue: Checking if command is in in queue [PR: I=11] (8)
18:04:18.616931 (  14480| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:19.613016 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:20.612575 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:21.608490 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:22.606201 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:23.604771 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:24.602882 (  15344| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:25.600831 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:26.597438 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:27.595410 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:28.317772 (  15824| 14728) handleOTGW  (1771): Net2Ser: Sending to OTGW: [PR=I] (4)
18:04:28.564279 (  13808| 13208) checkOTGWcmd(1293): CmdQueue: Checking if command is in in queue [PR: I=11] (8)
18:04:28.727358 (  15344| 14504) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:29.626360 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:30.588597 (  15400| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:31.587418 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:32.584916 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:33.583013 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:34.579991 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:35.577788 (  14480| 13856) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:36.575655 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]
18:04:37.573828 (  13808| 13208) processOT   (1676): Request Boiler    R00000000 (9)[MsgID=  0][READ_DATA       ]>Status = Master [-----W--]

Screenshot_20240301-232539 Screenshot_20240301-232629 Screenshot_20240301-232657 Screenshot_20240301-232736-EDIT

rotilho commented 3 months ago

I'm starting to suspect that may be my boiler. Restart didn't work this time, just after using HW.

htca commented 3 months ago

I have the same issue, checked several things, but somehow I cannot determine what’s wrong. Can the pic be faulty, is the otgw board? I encountered the same behaviour after reflashing and returning to default parameters and no mqtt connection. I am close to the point to toss the thing in a corner and try the diyless device…

hvxl commented 3 months ago

As you didn't provide any logs or other information about the things you checked, it's not really possible to answer your questions.

htca commented 3 months ago

otdata.txt otlog-20240312.txt sorry for my frustration.... I am really thankfull for any help. I just made them, I rebooted the OTGW and logged it until I needed to go the office..

hvxl commented 3 months ago

A few things stand out:

Can you confirm that the reason the log stops is not due to the WiFi connection dropping? Can you switch off whatever is sending all those serial commands to see if that makes any difference? When the problem happens again, can you monitor TCP port 23 while doing the following:

htca commented 3 months ago

ok, used a fresh new wemos and flashed it, improved the wifi by moving the device a bit... Will have it running for a while and report later.

htca commented 3 months ago

Just to add; I also reconnected the the OTGW integration in home assistant, it still works as before. I suspect that the improving the wifi was the issue (it is ok at the moment, but I need some time to expand the wifi range properly).

rvdbreemen commented 3 months ago

@htca what kind of "integration" are you using? Are you using the MQTT way of integrating? If you use the native component in HA, then you use serial over wifi to control. This was not what the ESP8266 firmware was build for, the MQTT integration is preferred way (imho).

So wondering how it you new setup is working. And how you have setup your integration with OTGW from the HA perspective. MQTT or Serial over Network integration?

rvdbreemen commented 3 months ago

@htca I will close the issue as the wifi now seems to be fixed solving the issue. Still interested in your anwsers, so you can reopen the issue.

htca commented 3 months ago

Actually I used both. The MQTT to get the status in ha and I installed the ha integration to have an easy implementation to the update the outside temperature. I think something had changed in ha in one of the updates, up to a few months ago it worked as supposed, although I had regularly a freeze (once every few weeks), but the frequency of freezes increased to a few hours maximum. I assumed I could use both interfaces simultaneously (mqtt and the serial) but I use now only mqtt and use a publish automation of the external temperature. Thanks for all your good work and effort! Fine if you close the issue of course.

rvdbreemen commented 3 months ago

@htca thanks for the response. Combining both integrations has worked for others before, not sure what has changed. But glad to know that the MQTT integration works as designed for you.

Will keep topic closed then.