home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.46k stars 29.4k forks source link

OTGW losing connection after a few hours #29961

Closed PeeVv closed 4 years ago

PeeVv commented 4 years ago

Home Assistant release with the issue: Home Assistant 0.103.0

Last working Home Assistant release (if known): Unknown

Operating environment (Hass.io/Docker/Windows/etc.): Running Hass.io (supervisor 192) on an x64 Debian server

Integration: OTGW

Description of problem: When the server starts I get a few errors (Timed out waiting for command) but the integration works without problems. However after a few hours (last time lasted 3,5 hours) the updates stop coming in and the connection no longer works. After a restart of Home Assistant it all works again for a few hours, so it seems it doesn't try to reconnect without a restart of HA.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

Traceback (if applicable):

Exception in set_characteristics: 1
14 december 2019 21:32 components/homekit/type_thermostats.py (ERROR) - bericht kwam voor het eerst om 14 december 2019 21:32 en verschijnt 3 malen
Timed out waiting for command: PR, value: S.
14 december 2019 18:25 components/opentherm_gw/__init__.py (ERROR)

These are the only errors mentioning OTGW or thermostat

Additional information: The OTGW itself is a wifi version with a NodeMCU from nodo-shop. The wifi is an Unifi setup so it should be stable enough and the NodeMCU is close to an AP so wifi should be fine

probot-home-assistant[bot] commented 4 years ago

Hey there @mvn23, mind taking a look at this issue as its been labeled with a integration (opentherm_gw) you are listed as a codeowner for? Thanks!

frenck commented 4 years ago

Please use the issue template that was presented when creating the issue. Could you please adjust your opening post to contain it? Thanks! 👍

PeeVv commented 4 years ago

I have changed the opening post. Yesterday the OTGW worked for 13 minutes before losing the connection. I have set the logger to debug for OT:

logger:
  default: info
  logs:
    homeassistant.components.opentherm_gw: debug

But it doesn't seem to add any relevant log entries.

mvn23 commented 4 years ago

Please try enabling debug logging for the following components as well:

pyotgw.pyotgw
pyotgw.protocol

This will help determine whether or not the automatic reconnect system is performing as it should. Also, please post a full debug log (apart from passwords/private urls) and leave the determination of relevancy up to us/me. Please use a service like pastebin/hastebin to avoid polluting the thread here. Apart from that, random timeouts normally indicate connection issues, but the library should automatically reconnect in such cases. The debug log should indicate why that's not working for you.

PeeVv commented 4 years ago

Two days ago I changed some settings of my Pi-hole since my Chromecasts didn't show up anymore. I disabled the "Never forward non-FQDNs" and "Never forward reverse lookups for private IP ranges" settings and the OWGW is alive 48 hours going strong, so it seems like this also fixed the OTGW issues.

davey400 commented 4 years ago

I have the same issue here for several days now (since the day I started to use the integration). I am not using a Pi-hole, but I don't see how the internet-connection affects the workings of the communication between one local device to another?

Als just calling the service opentherm_gw/reset_gateway does not fix the issue; I can only workaround it by restarting HA. (call homeassistant.restart)

mvn23 commented 4 years ago

@davey400 please provide logs as requested above, thanks.

davey400 commented 4 years ago

I did enable it just a moment ago and will keep you updated.

cyberjunky commented 4 years ago

For what it's worth; I started using my OTGW and the opentherm_gw component a few days ago. It works without the issue mentioned above. I use an Wemo D1 Mini with ESP 2.x mega-20191208 firmware and Serial Server defined. OTGW using Firmware OpenTherm Gateway 4.2.5 socket://192.168.x.y:3210 as connect string, if that's of any help.

Screenshot from 2019-12-24 12-54-49

BTW is seems this component cannot be used with yaml, you have to use GUI to enable the entities, which is has a lot of, very tidious to enable/disable over 40 of them. Any plans to add yaml support back for the sensors, binary_sensors?

davey400 commented 4 years ago

Ah, I think I might try ESP 2.x mega-20191208 to find out if that might do the trick. However: when using OTMonitor instead of the HA integration the information is flowing in constantly, at least for several days. Capture

What I think is happening is that it measures the data, but only sometimes. And it does not hold the data. When it crashes, it holds the most recent measurements. I did want to make a screenshot to accompany this message, and it seems that it has crashed again a few minutes ago. I will check the logs soon.

Wat people at the Dutch Tweakers Home Assistant forum are telling is that they just use MQTT to make the OTGW work with HA. What I do think why this is working and why they don't see the crippled data is because their HA only reads the last value from MQTT, so they never have a 'no value' situation.

mvn23 commented 4 years ago

What I think is happening is that it measures the data, but only sometimes. And it does not hold the data. When it crashes, it holds the most recent measurements.

What the integration does is quite similar to how MQTT would report the data. It listens for any changes to the various parameters on the OpenTherm connection and informs the sensors in HA of any changes. MQTT would also only send a message when otmonitor sends an update. May I ask what you used to get the screenshot you provided? In my HA history view it does not show the separate measurements.

BTW is seems this component cannot be used with yaml, you have to use GUI to enable the entities, which is has a lot of, very tidious to enable/disable over 40 of them. Any plans to add yaml support back for the sensors, binary_sensors?

No plans for that, no. The idea is that most people only need a few sensors (if any) and you'll only need to configure them once. I do agree that the interface to enable/disable entities could use some improvement, but that's not an opentherm_gw issue.

cyberjunky commented 4 years ago

No plans for that, no. The idea is that most people only need a few sensors (if any) and you'll only need to configure them once. I do agree that the interface to enable/disable entities could use some improvement, but that's not an opentherm_gw issue.

The issue I had is that to find out which of the values are supported by my boiler, I had to enable them all first, and them choose which ones are useful. And that is a lot of clicking, a multi-select or enable all option in GUI would be helpful indeed, so no component issue. Or alternatively implement an integration option 'disable all by default'. True/False?

davey400 commented 4 years ago

After enabling debug logging I am seeing something of which I think that should not be there; a lot of watchdog resets. Is that normal?

Dave

2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Received line 52366: T80190000 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Added line 52366 to message queue. Queue size: 1 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Processing: T 00 19 00 00 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Watchdog reset! 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Received line 52367: B4019254C 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Added line 52367 to message queue. Queue size: 1 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Processing: B 04 19 25 4c 2019-12-24 20:55:47 DEBUG (MainThread) [pyotgw.protocol] Watchdog reset! 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Received line 52368: T10383C00 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Added line 52368 to message queue. Queue size: 1 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Processing: T 01 38 3c 00 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Watchdog reset! 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Received line 52369: BD0383C00 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Added line 52369 to message queue. Queue size: 1 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Processing: B 05 38 3c 00 2019-12-24 20:55:48 DEBUG (MainThread) [pyotgw.protocol] Watchdog reset!

mvn23 commented 4 years ago

Is that normal?

Yes.

davey400 commented 4 years ago

home-assistant.zip Hereby the full log I grabbed yesterday. Only my domain name has been replaced.

mvn23 commented 4 years ago

Thanks for the log. "Unfortunately", everything looks OK from here. The last update is logged <30 seconds before the end of the log. Did the issue occur while you were capturing this log? If not, can you try to capture when it does happen?

cyberjunky commented 4 years ago

I enable and checked my pyotgw.protocol log and also see a lot of 'Watchdog reset' entries. I guess it means a 'watchdog alive' signal, instead of a watchdog reboot, right? If so maybe good to rename it to 'Watchdog timer reset' so it's less scary... :sweat_smile:

davey400 commented 4 years ago

The last correct measurement seem to have been around 20:42. The log is based on a copy that I made at a random moment later that evening.

mvn23 commented 4 years ago

Nothing strange around that time either. The connection to the gateway seems to remain intact, so the problem must lie elsewhere. Any chance you can provide a debug log of all HA (core/opentherm_gw) modules as well when the problem occurs?

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.