hvxl / otmonitor

Monitor tool for the OTGW (http://otgw.tclcode.com/)
GNU Lesser General Public License v3.0
27 stars 10 forks source link

Otmonitor loses TCP connection #6

Closed SenH closed 4 years ago

SenH commented 4 years ago

I have a Nodo shop OTGW with NodeMCU (ser2net). The TCP connection with Otmonitor seems to break (log attached) and messages are not received anymore by Otmonitor. Clicking 'Connect' in Connection Settings re-establishes the connection and the messages re-appear in Otmonitor.

I'm also using py-otgw-mqtt with the same TCP connection in parallel which does not seem to break on the same moment, it keeps receiving messages.

Sep 06 11:59:36 router systemd[1]: Started OTMonitor Daemon.
Sep 06 16:59:11 router tclkit-x64[25740]: bad index "4294967296": must be integer?[+-]integer? or end?[+-]integer?
Sep 06 16:59:11 router tclkit-x64[25740]:     while executing
Sep 06 16:59:11 router tclkit-x64[25740]: "lindex $track($arg) $i"
Sep 06 16:59:11 router tclkit-x64[25740]:     (procedure "vartrace" line 16)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "vartrace gui returntemp write"
Sep 06 16:59:11 router tclkit-x64[25740]:     (in namespace inscope "::track" script line 1)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "::namespace inscope ::track vartrace gui returntemp write"
Sep 06 16:59:11 router tclkit-x64[25740]:     (write trace on "gui(returntemp)")
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "set gui($name) $float"
Sep 06 16:59:11 router tclkit-x64[25740]:     (procedure "guifloat" line 7)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "guifloat returntemp returnwatertemperature $list"
Sep 06 16:59:11 router tclkit-x64[25740]:     (procedure "returntemp" line 3)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "returntemp 4208"
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "specialcoro force"
Sep 06 16:59:11 router tclkit-x64[25740]:     (procedure "otmessage" line 23)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "otmessage [clock microseconds] $line [expr {$type & 7}] $id $data"
Sep 06 16:59:11 router tclkit-x64[25740]:     (procedure "process" line 12)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "process [append data $line]"
Sep 06 16:59:11 router tclkit-x64[25740]:     (procedure "receive" line 6)
Sep 06 16:59:11 router tclkit-x64[25740]:     invoked from within
Sep 06 16:59:11 router tclkit-x64[25740]: "receive"
Sep 07 04:59:45 router tclkit-x64[25740]: bad index "4294967296": must be integer?[+-]integer? or end?[+-]integer?
Sep 07 04:59:45 router tclkit-x64[25740]:     while executing
Sep 07 04:59:45 router tclkit-x64[25740]: "lindex $track($arg) $i"
Sep 07 04:59:45 router tclkit-x64[25740]:     (procedure "vartrace" line 16)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "vartrace gui returntemp write"
Sep 07 04:59:45 router tclkit-x64[25740]:     (in namespace inscope "::track" script line 1)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "::namespace inscope ::track vartrace gui returntemp write"
Sep 07 04:59:45 router tclkit-x64[25740]:     (write trace on "gui(returntemp)")
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "set gui($name) $float"
Sep 07 04:59:45 router tclkit-x64[25740]:     (procedure "guifloat" line 7)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "guifloat returntemp returnwatertemperature $list"
Sep 07 04:59:45 router tclkit-x64[25740]:     (procedure "returntemp" line 3)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "returntemp 4352"
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "specialcoro force"
Sep 07 04:59:45 router tclkit-x64[25740]:     (procedure "otmessage" line 23)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "otmessage [clock microseconds] $line [expr {$type & 7}] $id $data"
Sep 07 04:59:45 router tclkit-x64[25740]:     (procedure "process" line 12)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "process [append data $line]"
Sep 07 04:59:45 router tclkit-x64[25740]:     (procedure "receive" line 6)
Sep 07 04:59:45 router tclkit-x64[25740]:     invoked from within
Sep 07 04:59:45 router tclkit-x64[25740]: "receive"
hvxl commented 4 years ago

The only way I can see this happening is when the return water temperature is somehow queried from the boiler at intervals greater than 2 hours. Are you adding MsgID 28 as an alternative message and then later deleting it again?

Obviously, this is incorrect behaviour of the tool. But it should only manifest itself in very unusual circumstances. (Which also explains why nobody else has reported it.)

SenH commented 4 years ago

I added AA=28 upon install of the OTGW (since the thermostat was not requesting this). According to the summary ID 28 is requested every 6 seconds.

Hex | Dec | Dir | Description | Freq | Value
-- | -- | -- | -- | -- | --
00 | 0 | Read | Status | 6 | 00000010 00000000
01 | 1 | Write | Control setpoint | 6 | 10.00
02 | 2 | Write | Master configuration | 47 | 00000001 13
03 | 3 | Read | Slave configuration | 47 | 01010001 33
05 | 5 | Read | Application-specific flags | 47 | 00000000 0
09 | 9 | Read | Remote override room setpoint | 6 | 0.00
0E | 14 | Write | Maximum relative modulation level | 47 | 100.00
10 | 16 | Write | Room setpoint | 47 | 15.00
11 | 17 | Read | Relative modulation level | 5 | 0.00
18 | 24 | Write | Room temperature | 47 | 21.45
19 | 25 | Read | Boiler water temperature | 6 | 19.20
1C | 28 | Read | Return water temperature | 6 | 16.94
39 | 57 | Read | Max CH water setpoint | 47 | 80.00

The strange thing is that I have the exact same setup (boiler, OTGW, thermostat) two times in my house (separate heating for home & office) but the other otmonitor does not exhibit this problem (no errors in log, keeps running for weeks).

Here is the summary for the 2nd OTGW

Hex | Dec | Dir | Description | Freq | Value
-- | -- | -- | -- | -- | --
00 | 0 | Read | Status | 6 | 00000010 00000000
01 | 1 | Write | Control setpoint | 6 | 10.00
02 | 2 | Write | Master configuration | 48 | 00000001 13
03 | 3 | Read | Slave configuration | 48 | 01010001 33
05 | 5 | Read | Application-specific flags | 48 | 00000000 0
09 | 9 | Read | Remote override room setpoint | 6 | 0.00
0E | 14 | Write | Maximum relative modulation level | 48 | 100.00
10 | 16 | Write | Room setpoint | 48 | 15.00
11 | 17 | Read | Relative modulation level | 5 | 0.00
18 | 24 | Write | Room temperature | 48 | 20.95
19 | 25 | Read | Boiler water temperature | 6 | 32.80
1C | 28 | Read | Return water temperature | 6 | 22.62
39 | 57 | Read | Max CH water setpoint | 48 | 80.00
hvxl commented 4 years ago

I found that the problem doesn't mean that the value hasn't been queried for more than 2 hours, but that the value hasn't changed for that long. Then when it does change, you get this error. It should be fixed by 61ef646252e5303f543344c0e97899a4d270bf10.

SenH commented 4 years ago

Thanks a lot for looking into it. I'm running otmonitor now with the latest commit, let's see if the error disappears.

SenH commented 4 years ago

Fixed from my side. Closing.