rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
152 stars 34 forks source link

Looses connection with PIC #247

Closed ArJay60 closed 4 months ago

ArJay60 commented 6 months ago

With some time in between (most of the time around a day or two) OTGW firmware seems to loose connection with PIC. As this happens the Home page of the OTGW webserver does only show header and buttonbar. So no single row with the name of the sensor and the value of that sensor. Also values are not passed through over MQTT to HA.

While in error on the homepage I am able to read PIC firmware versions using Advanced - PIC Firmware of the OTGW Webserver. So communication with the PIC doesn't seem to be the issue. Values here are: Firmware name Version Size diagnose.hex 2.1 9196 gateway.hex 6.5 26124 interface.hex 2.0 10488

Resetting PIC with the reset button on the OTGW PCB (I am using the one sold in the Nodo shop) combined with ESP8266 for MQTT communication with HA) makes that within a few seconds the values are available again on the home page of the OTGW webserver Home page and within HA.

What goes wrong? Is is possible to monitor on the OTGW software on failing to retrieve sensor values and then reset the PIC? Is it possible to do a reset of the PIC remotely through OTGW (monitor myself if I still retrieve values and then do a remote reset from out of HA)?

Firmware Version0.10.2+50c3ed2 PIC Availabletrue PIC Firmware Version6.5 PIC Device IDpic16f1847

rvdbreemen commented 6 months ago

Hi @ArJay60

Could you supply some logging using port 23.

Check the wiki: https://github.com/rvdbreemen/OTGW-firmware/wiki/How-to-debug-the-OTGW-firmware

That will help analyse your situation.

And yes, you can reset using the GW=R command. But before you go that route let's try to analyse what going on.

Thanks Robert

ArJay60 commented 6 months ago

OK. I will do that the next time I have the problem. Currently the logging is filled with the regular stuff because I did the manual reset. Regards Richard

ArJay60 commented 6 months ago

Still waiting for it to fail. Will keep monitoring.

rvdbreemen commented 6 months ago

Once you have it. The. Share. Btw did you know there is a new release available? 0.10.3

There a whole lot of small thinks that got fixed after a year of collecting small patches.

Robert

ArJay60 commented 5 months ago

Have the fault situation now. Below the log (but not from the moment It failed first. I don't have that).

If I do nothing only the 'Minute changed' and 'Uptime seconds' messages appear. I have, through the web interface of OTGW, asked once for the PIC Firmware. This resulted in the other line's.

I leave it in the faulty situation right now and will wait for you because you might have other things that you want to investigate.

Log:

09:01:00.426068 ( 16312| 14104) doTaskMinute( 279): Minute changed: 09:01:04.263059 ( 16312| 14104) sendMQTTupti( 157): Uptime seconds: 1618299 09:02:00.426395 ( 16152| 14104) doTaskMinute( 279): Minute changed: 09:03:00.427296 ( 16120| 14104) doTaskMinute( 279): Minute changed: 09:04:00.426309 ( 16072| 14104) doTaskMinute( 279): Minute changed: 09:05:00.426012 ( 16176| 14104) doTaskMinute( 279): Minute changed: 09:06:00.426629 ( 16152| 14104) doTaskMinute( 279): Minute changed: 09:06:04.264146 ( 16152| 14104) sendMQTTupti( 157): Uptime seconds: 1618598 09:07:00.426776 ( 16088| 14104) doTaskMinute( 279): Minute changed: 09:07:05.122146 ( 16136| 14104) apifirmwaref( 134): API: apifirmwarefilelist() 09:07:05.124318 ( 14768| 13456) apifirmwaref( 141): dirpath=/pic16f1847 09:07:05.143614 ( 14528| 13456) apifirmwaref( 147): dir.fileName()=diagnose.hex 09:07:06.154324 ( 15152| 14104) apifirmwaref( 160): version=2.1 09:07:06.156924 ( 13768| 12808) GetVersion ( 16): GetVersion opening /pic16f1847/diagnose.hex 09:07:06.221590 ( 13808| 12808) apifirmwaref( 164): GetVersion(/pic16f1847/diagnose.hex) returned []

09:07:06.223772 ( 13888| 12808) apifirmwaref( 147): dir.fileName()=diagnose.ver 09:07:06.225142 ( 13888| 12808) apifirmwaref( 147): dir.fileName()=gateway.hex 09:07:06.234684 ( 13808| 12808) apifirmwaref( 160): version=6.5 09:07:06.236940 ( 13768| 12808) GetVersion ( 16): GetVersion opening /pic16f1847/gateway.hex 09:07:06.409213 ( 15152| 14104) apifirmwaref( 164): GetVersion(/pic16f1847/gateway.hex) returned [6.5]

09:07:06.412582 ( 13888| 12808) apifirmwaref( 147): dir.fileName()=gateway.ver 09:07:06.414136 ( 13888| 12808) apifirmwaref( 147): dir.fileName()=interface.hex 09:07:07.424292 ( 15152| 14104) apifirmwaref( 160): version=2.0 09:07:08.426771 ( 13768| 12808) GetVersion ( 16): GetVersion opening /pic16f1847/interface.hex 09:07:08.504889 ( 14480| 12808) apifirmwaref( 164): GetVersion(/pic16f1847/interface.hex) returned []

09:07:08.507818 ( 13888| 12808) apifirmwaref( 147): dir.fileName()=interface.ver 09:07:08.508553 ( 13888| 12808) apifirmwaref( 178): filelist response: [{"name":"diagnose.hex","version":"2.1","size":9196},{"name":"gateway.hex","version":"6.5","size":26124},{"name":"interface.hex","version":"2.0","size":10488}]

rvdbreemen commented 5 months ago

@ArJay60 Well, the ESP firmware works, but there are not responses or message from the PIC anymore.

So could you try the following: 1) Logon to port 23 (where you got the logging) 2) Hit the 'p' to do a manual reset of the pic 3) See if PIC is rebooted.

Let me know the result please.

Thanks Robert

ArJay60 commented 5 months ago

Reboots with 'p'. below the log directly after booting:

8:16:00.899064 ( 16064| 14104) doTaskMinute( 279): Minute changed: p18:16:09.917746 ( 16064| 14104) sendMQTTupti( 157): Uptime seconds: 1737955 p 18:16:30.595982 ( 15824| 14104) handleDebug ( 22): Manual reset PIC 18:16:30.765409 ( 15744| 14104) detectPIC ( 122): ETX found after reset: Pic detected! 18:16:30.770059 ( 13136| 12160) handleDebug ( 22): Manual reset PIC 18:16:31.937884 ( 15824| 14104) detectPIC ( 122): ETX found after reset: Pic detected! 18:16:32.027417 ( 15904| 14104) fwreportinfo(1967): Callback: fwreportinfo 18:16:32.029910 ( 13216| 12160) fwreportinfo(1970): Current firmware version: 6 .5 18:16:32.031134 ( 13216| 12160) fwreportinfo(1972): Current device id: pic16f18 47 18:16:32.031874 ( 13216| 12160) fwreportinfo(1975): Current firmware type: gate way 18:16:32.032675 ( 13216| 12160) processOT (1701): Current firmware version: 6 .5 18:16:32.033456 ( 13216| 12160) processOT (1703): Current device id: pic16f18 47 18:16:32.034155 ( 13216| 12160) processOT (1705): Current firmware type: gate way 18:16:32.146972 ( 10976| 10296) processOT (1676): Boiler BC0193D00 (9)[MsgID= 25][READ_ACK ]>Tboiler = 61.00 °C 18:16:32.306859 ( 12968| 12160) processOT (1676): Thermostat T101815A3 (9)[MsgID= 24][WRITE_DATA ]>Tr = 21.64 °C 18:16:33.146211 ( 13664| 12808) processOT (1676): Boiler BD01815A3 (9)[MsgID= 24][WRITE_ACK ] Tr = 21.64 °C 18:16:33.311065 ( 12296| 11512) processOT (1676): Thermostat T80000200 (9)[MsgID= 0][READ_DATA ]>Status = Master [-D---W--] 18:16:34.149338 ( 12320| 11512) processOT (1676): Boiler B40000200 (9)[MsgID= 0][READ_ACK ]>Status = Slave [--------] 18:16:34.323290 ( 12968| 12160) processOT (1676): Thermostat T10010600 (9)[MsgID= 1][WRITE_DATA ]>TSet = 6.00 °C 18:16:35.182797 ( 13640| 12808) processOT (1676): Boiler BD0010600 (9)[MsgID= 1][WRITE_ACK ] TSet = 6.00 °C 18:16:35.318330 ( 15848| 14104) processOT (1676): Thermostat T00110000 (9)[MsgID= 17][READ_DATA ] RelModLevel = 0.00 % 18:16:36.153529 ( 14304| 13456) processOT (1676): Boiler BC0110000 (9)[MsgID= 17][READ_ACK ]>RelModLevel = 0.00 % 18:16:36.335057 ( 13640| 12808) processOT (1676): Thermostat T80190000 (9)[MsgID= 25][READ_DATA ] Tboiler = 0.00 °C 18:16:37.178881 ( 12968| 11512) processOT (1676): Boiler B401933CC (9)[MsgID= 25][READ_ACK ]>Tboiler = 51.80 °C 18:16:37.317545 ( 15848| 14104) processOT (1676): Thermostat T00090000 (9)[MsgID= 9][READ_DATA ] TrOverride = 0.00 °C 18:16:37.460332 ( 15176| 14104) processOT (1676): Boiler BF0090000 (9)[MsgID= 9][UNKNOWN_DATA_ID ]-TrOverride = 0.00 °C 18:16:38.155406 ( 12992| 12160) processOT (1676): Answer Thermostat AC0090000 (9)[MsgID= 9][READ_ACK ]>TrOverride = 0.00 °C 18:16:38.328709 ( 12296| 11512) processOT (1676): Thermostat T80000200 (9)[MsgID= 0][READ_DATA ]>Status = Master [-D---W--]

rvdbreemen commented 5 months ago

Good, that confirms that somehow the PIC needs a reboot. I can add that to the firmware.

When I have the time I will ping you to test it for me.

Robert

hvxl commented 5 months ago

Now the question becomes why the PIC needs a reset. I have not heard from other people that they experience the PIC getting stuck.

rvdbreemen commented 5 months ago

Great question @hvxl i was wondering about that too. Any idea how we could find out what is going on in his case?

hvxl commented 5 months ago

I see two possible ways to think about this:

rvdbreemen commented 5 months ago

@ArJay60 could you possibly run a long term log to find the situation that seems to cause this fault of the PIC. Another option is to replace the PIC, you can order one a nodoshop, that way the hardware issue can be ruled out.

Let me know if you are willing to find the underlying problem.

hvxl commented 5 months ago

As I said, the easiest thing to try first is to reflash the PIC firmware. If the problem still occurs after that, you can try some more involved avenues.

ArJay60 commented 5 months ago

Can I reflash the PIC while installed? By which instructions?

rvdbreemen commented 5 months ago

That's simple:

  1. Goto the otgw.local (or whatever the IP of your otgw esp)
  2. Click on Advanced button on the top right. image
  3. Click on the PIC firmware image
  4. You should get a webpage like this: image
  5. Now click on the "install this firmware" icon on the right of "gateway.hex" entry. So click this icon:

image

It takes a few seconds, and the firmware should be installed, the screen will refresh. And you should be back to the default webpage of OTGW and the PIC should be installed with new firmware.

If you want to observe the upgrade, just logon to port 23 and observe the firmware flash process in the log.

ArJay60 commented 5 months ago

Done that (PIC upgrade was succesful ). I will monitor the status of the PIC/OTGW for the next couple of days to see if it continuous to work properly. I will report regularly over here.

ArJay60 commented 4 months ago

Seems to solved with a reflash of the PIC. Will close and keep fingers crossed that problem is gone.