Closed andriej closed 5 years ago
Reviewing the changes in HA between 0.99.3 and 0.100.2 I don't see anything that could cause this. pyotgw 0.5b0 will be in the next HA version (probably 0.101) but the only change there is an added disconnect() function. Does your HA log provide more info? Please enable debug logging in HA for the following modules:
So far I try to find situation which may cause issues. So far everything seems stable (still), will turn on debug after I find something may cause it.
@mvn23 I've been able to reproduce it 'easily', just changing value via UI. I've redacted the quite-long log
Here is the log - if there's anything to check, let me know.
During the outage HA showed OTGW as available and even the readings as you can see:
Last time I've been using iSense workaround fix - as it was no-heating time of year, I'm right now not sure if it got migrated into the final code.
I've tried to reproduce issue, so far without success.
The iSense fix was integrated in v0.4b3 and I can tell from the logs that it is active in your setup, but thanks for mentioning it as it may be related. In the log you provided, there seem to be 7 minutes missing around line 145. During that time, the temperature override was activated - seemingly without triggering the issue - while later on changing the override does trigger the issue. How did you activate the override the first time?
Via HA UI
Could you include that part of the log as well?
I misunderstood, here's the missing part
Are you making any calls to the opentherm_gw.set_outside_temperature
service? Maybe in an automation? Can you confirm that this is breaking stuff when called manually?
I'm making indeed - I don't have physical outside temp. sensor connected to boiler so I emulate it with my Aqara zigbee one. Everytime it change it's state, it's reported to boiler (and showed properly on thermostat too).
- alias: "OTGW: set outside"
initial_state: true
trigger:
platform: state
entity_id: sensor.temperature_zewnatrz
action:
service: opentherm_gw.set_outside_temperature
data_template:
temperature: "{{ states('sensor.temperature_zewnatrz') }}"
gateway_id: "termostat"
I have just tested it by calling it from 'Service' part of HA and - funny part - it showed the -2.5 value (-2.3 reported) on thermostat, but other parts didn't update (like temp change). In log it seems it got stuck 10 minutes at least before my test, as I was restarting HA. here's the log
Thanks for the log. Also here it seems that the issue is triggered by the outside temperature command. Can you run some tests with the automation disabled? Just to see if manually running the command causes the issue? I can't reproduce it locally so anything that can narrow down the root cause will help a lot.
One more test-pattern I've run. Here is the log and here are the steps: 22:28 - HA restart 22:29 - OTGW loads and propagate values 22:30 - 20.5 propagates (set before restart) 22:30 - changed to 21.5 (21 is default on thermostat) 22:31 - changed to 'back to program' on thermostat -> 21 -> HA shows the change 22:33 - on thermostat i set 20 it seems dead now, no further messages (B/T in debug)
I will disable the automation right now and restart (power on/off) the gateway
So here it is with only one outside report quite long before dead-lock. Log file pattern: 22:36 -> 20 -> 21 on thermostat (21 default, "continue program") 22:38 -> 21 -> 22 on thermostat [propagated to HA] 22:39 -> 22 -> 20.5 on HA UI 22:40 -> set outside for -2.3 (default from example) 22:41 -> 20.5 to 21 on thermostat 22:42 -> 21 to 21.5 in HA UI connection dead (no B/T messages)
I'd like to come back with any new findings, but since 2019-10-21 02:50:38 (last restart of HA) connectivity is working. I've restarted HA just right now to get logfile lower (it got like >20MB in size). Also, what I can't see in log are reports of temperature outside, even tho automation is enabled. It may be because I was changing some things without restarts, will check.
Just random finding. Next day without 'issue', except - temperature changed first on thermostat, then i UI gives 'O' command all the time in the log. Except, it still works. logfile maybe it will broaden search for issue
That would be the iSense fix doing its thing.
Ok, glad it's working then. :-) Will look in the log for any issues if they appear.
Got it! It 'stopped' working exactly the moment that scheduled (on thermostat) value kicked in for night. log
I've tried to 'reset gateway' with service from HA. It looks in log like it still can see the values previously, even tho there's no data incoming. Maybe it's worth checking if the B/T messages are there? Just an idea. log of service reset
what has helped (just wanted to test) was pulling out and putting back in the RJ45 LAN cable. watchdog triggered and otgw has reconnected logfile from the moment - I guess it confirms it's somehow worth checking the B/T messages
The fact that commands are still working and the watchdog is still active leads me to believe that the connection with the gateway remains intact, but the message processing task gets stopped/blocked somehow. Of course I did not build in enough debugging output to confirm this, so we'll have to modify some files first. If I provide you with some .patch files, can you modify your existing pyotgw files or would you prefer I create a custom component?
I can do both, but patch files seems easier :-) I have it in venv so will be easy.
One more situation when it was all hanged and re-plug of LAN cable made it reconnect. log
Just informational: I've moved to 0.101 (if it make any difference for patch files)
Sorry for the delay, I will have time to look at this later this weekend.
Not a problem, I will also be able to start debugging since sunday minimum. :-) Good thing is, that untouched - it just works.
Can you reproduce the issue with the following patch applied to protocol.py
and post the debug log?
logging.diff
First try: thermostat -> to 22.5 HA -> to 22 thermostat -> to 22.5 thermostat -> to 21 [dead? didn't update in HA] (tried) HA -> to 23 [temperature changed on thermostat] [22:29] thermostat -> to 20.5 (again) [no update in HA] [22:30] HA -> to 22.5 [updated on thermostat... what is happening?] [22:31] HA -> to 22 [ok] [22:33] HA -> to 21 [22:33] thermostat -> to 20.5 [not updated in HA, HA is on 21] [22:35] thermostat -> to 20 [not updated in HA, HA is on 21] [22:36] HA -> to 19 [thermostat updated to 19...] [22:36] HA -> to 18.5 [ok] [22:37] thermostat to 18 [no updated in HA, stayed on 18.5] [22:38] on thermostat I cancel override and back to scheduled 21 HA still on 18.5 [22:39] thermostat -> 21.5 [HA still 18.5] [22:40] HA -> 20 thermostat updated So far I can't reproduce total connection loss which worries me somehow, as patch is the only LOGGER lines added. logfile
[22:56] HA -> 20.5 [22:57] HA -> 20 [22:58] TSTAT -> 19 [22:59] TSTAT -> 20 (not visible in HA, HA still on 19) [23:00] TSTAT -> cancel override, back to 21 from schedule automatically (HA stay still and show 19) [23:01] TSTAT -> 21.5 (so back to overriding scheduled value; HA still on 19) [23:02] HA -> 22 (ok, value changed after a while on thermostat, first 'override' hand icon then the value propagates) [23:03] TSTAT -> back to 19 (HA still on 22) [23:04] TSTAT -> cancel override, back to 21 (HA still on 22)
So instead of 'dying' now I have more like a one-way solution. logfile
Looks like my suspicion is being confirmed: The standard T/B/R/A messages stop being processed for some reason and the message queue keeps growing. On both logs you provided, the last message being processed shows a line like this: Processing: A xx 09 xx xx
(this is exactly where the iSense quirk is handled)
Can you try a few times and just confirm that the last processed message before the command queue starts growing is one of those? In that case it's definitely a bug in the iSense quirk handling and I will know where to look. This will also explain why I'm not seeing the issue at home.
Can you try this possible fix? maybe_fix.diff
[16:32] TSTAT -> 28.5 [16:40] HA -> 28 (didn't change on termostat so...) [16:43] HA -> 27.5
so no effect on iSense while message counter 'works', doesn't go up. logfile
I can tell from the log that your thermostat is not receiving the updated temperatures. The problem is that the thermostat is supposed to request these updates from the OpenTherm Gateway (message T00090000
). In the last log you posted (with fix) this does not happen at all.
As the initial problem was likely triggered by the gateway's response to these messages, there's no way to tell if it's fixed from this log.
Do you have an option to reset your thermostat? If it's powered by the gateway this can be achieved by unplugging and re-plugging the gateway.
Sorry I was resetting my iSense to default and could turn turn off remote control. Ignore last log, will re-check properly all settings before next try!
[18:31] HA -> 21 [18:31] TSTAT -> 21.5 [18:33] HA -> 22 (for a moment HA showed 20 as setpoint and then proper 22 - I know it's as it should be, just notifying) [18:34] TSTAT -> 22.5 [18:35] HA -> 23 (again, 20 and updated to 23 seconds after) [18:35] TSAT -> 23.5 [18:36] HA -> 24 (again, 20 -> 24) [18:37] TSTAT -> cancel override back to default 20) [18:37] HA -> 21
ALL of above commands worked properly, problem seems to be fixed! I'm once again sorry for the reset of iSense and forgot the option of 'remote access' not turned on. I've seen it as F200 error - no external temperature sensor while I was setting whole thing. :-)
Logfile just to confirm it works.
Fixed in 6962e5ad9158646452d90f27a3ad2841fb608c03
Is 0.5b0 used in 0.100.2? Since this version I've started to have disconnect issues again (so OTGW seems connected, but no updates)
It is even more strange that I've been using OTGW without issues for a quite long time. Also, I have LAN gateway so no wifi issues