rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
145 stars 34 forks source link

Stops talking to OTGW after few hours #245

Closed avk999 closed 2 months ago

avk999 commented 2 months ago

Hello, I run 0.10.3+e334c42 on Wemos D1. After some time (days or hours) HA starts reporting error "Unable to connect to serial port" and web interface of Wemos shows no OT data: image I've replaced D1 but nothing changed - it still stops receiving OT data.

Logs from telnet port when the data stopped coming in:

22:56:50.457774 (  15168| 14112) processOT   (1676): Thermostat        T007D0000 (9)[MsgID=125][READ_DATA       ] OpenThermVersionSlave = 0.00 ^M
22:56:50.475591 (  15168| 14112) processOT   (1676): Boiler            B407D0203 (9)[MsgID=125][READ_ACK        ]-OpenThermVersionSlave = 2.01  <ignored> ^M
22:56:50.486529 (  15168| 14112) processOT   (1676): Answer Thermostat AC07D0300 (9)[MsgID=125][READ_ACK        ]>OpenThermVersionSlave = 3.00 ^M
22:56:50.501037 (  15168| 14112) processOT   (1676): Thermostat        T00000300 (9)[MsgID=  0][READ_DATA       ]>Status = Master [CD---W--]^M
22:56:50.520481 (  15168| 14112) processOT   (1676): Boiler            BC0000300 (9)[MsgID=  0][READ_ACK        ]>Status = Slave  [--------]^M
22:56:50.534605 (  15168| 14112) processOT   (1676): Thermostat        T00030000 (9)[MsgID=  3][READ_DATA       ] SlaveConfigMemberIDcode = Slave Config[00000000] MemberID code [  0]^M
22:56:50.542943 (  15840| 14112) handleOTGW  (1771): Net2Ser: Sending to OTGW: [PS=1] (4)^M
22:56:50.571080 (  15248| 14112) checkOTGWcmd(1293): CmdQueue: Checking if command is in in queue [PS: 1] (5)^M
22:56:50.746921 (  14600| 13464) processOT   (1707): Not processed, received from OTGW => (00000011/00000000,41.00,00000000/00000000,0.00,0.00,100.00,0/0,0.00,0.00,0.90,0.00,0.00,0.00,26.50,31.00,0.00,26.00,0.00,0,65/35,75/20,60.00,0.00,00000000/00000000,0,0,0,0,0,0,0,0,0,0) [183]^M
22:57:00.055853 (  16928| 14112) doTaskMinute( 279): Minute changed:^M
22:57:55.057099 (  16520| 14112) loopNTP     ( 267): Time resync needed^M
22:57:55.061405 (  15176| 14112) loopNTP     ( 237): Start time syncing^M
22:57:55.062341 (  15176| 14112) loopNTP     ( 239): Starting timezone lookup for [Europe/Amsterdam]^M
22:57:55.066031 (  15848| 14112) loopNTP     ( 254): 22:57:55 22-04-2024
^M22:57:55.067334 (  15176| 14112) loopNTP     ( 259): Time synced!^M
22:58:00.056527 (  16744| 14112) doTaskMinute( 279): Minute changed:^M
22:58:30.725603 (  16608| 14112) sendMQTTupti( 157): Uptime seconds: 108245^M
22:59:00.056404 (  16912| 14112) doTaskMinute( 279): Minute changed:^M

After 22:56 no Thermostat or Boiler messages were logged.

This condition is always resolved by software reboot of Wemos with

curl 'http://192.168.1.10/ReBoot?SUBMIT=ReBoot' 

Would be happy to receive any advice on what can cause this. I've already made sure the board doesn't overheat (added temporary fan), replaced the power supply to no avail.

avk999 commented 2 months ago

Debug information: image

avk999 commented 2 months ago

reboot log on the littlefs contains:

2024-04-23 08:23:53 - reboot cause: Software/System restart (4) 
2024-04-21 15:56:49 - reboot cause: Software/System restart (4) 
2024-04-21 12:41:43 - reboot cause: Software/System restart (4) 

I believe those times correspond to failures.

Settings.ini content:


{
  "hostname": "OTGWNEW",
  "MQTTenable": false,
  "MQTTbroker": "homeassistant.local",
  "MQTTbrokerPort": 1883,
  "MQTTuser": "",
  "MQTTpasswd": "",
  "MQTTtoptopic": "OTGW",
  "MQTThaprefix": "homeassistant",
  "MQTTuniqueid": "otgw-C8C9A30DF315",
  "MQTTOTmessage": false,
  "MQTTharebootdetection": true,
  "NTPenable": true,
  "NTPtimezone": "Europe/Amsterdam",
  "NTPhostname": "pool.ntp.org",
  "LEDblink": false,
  "GPIOSENSORSenabled": false,
  "GPIOSENSORSpin": 13,
  "GPIOSENSORSinterval": 20,
  "S0COUNTERenabled": false,
  "S0COUNTERpin": 12,
  "S0COUNTERdebouncetime": 80,
  "S0COUNTERpulsekw": 1000,
  "S0COUNTERinterval": 60,
  "OTGWcommandenable": true,
  "OTGWcommands": "GW=1",
  "GPIOOUTPUTSenabled": false,
  "GPIOOUTPUTSpin": 16,
  "GPIOOUTPUTStriggerBit": 0
}
rvdbreemen commented 2 months ago

The PS=1 pulls a report of values and then stops reporting the msgids.

This is by design. It looks like you are using the 'OTGW' integration of the serial network port.

Could you try setting up MQTT and then use that integration.

Turn off the native OTGW component in Home Assistant.

That way we can rule out issues with the serial interface.

Thanks, Robert

rvdbreemen commented 2 months ago

Closing this issue, the discussion continued in the discord community. It turned out that there was an issue with the powersupply, causing this strange issue. After finding this, it solved the freezing.

avk999 commented 2 months ago

Another possible reason may be the software thermostat sometimes calling HA service setting CH temperature 5 times in quick succession.