mvn23 / pyotgw

A library to interface with the OpenTherm Gateway through serial or network connection.
GNU General Public License v3.0
28 stars 16 forks source link

Causing unstable status of HA? #3

Closed andriej closed 5 years ago

andriej commented 5 years ago

Since I use native HA integration - after latest fix - it works (I mean it reconnects and I'm able to control my thermostat) but after running for many hours HA seems to get more and more unstable and eventually loosing some socket integrations like Xiaomi Aqara.

Is there possibility that there's some memory leak or socket's limit is getting eaten by component? I'd like to help, just let me know how can I participate.

gdschut commented 5 years ago

I think I have the same problem. See also here: https://community.home-assistant.io/t/help-automation-stops-working-after-hass-running-for-a-day-and-returns-to-normal-working-after-reloading-automations/92571 Today I found out I had a pyotgw crash/connection lost about 15 minutes before some of my automations do not work any longer. (of course not sure about any coincidence) As I monitor memory usage on my raspberry pi I dont think this is the problem. memory usage is real stable over time, at least in the amount of memory used. I did a reinstall of everything on my rpi this weekend, so everything is clean again, but before I had a lot of errors on influxdb and the recorder about not getting write access to the db. Since my reinstall the only error in the logfile is from pyotgw....

Jan 20 13:49:25 pi hass[16694]: 2019-01-20 13:49:25 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
Jan 20 13:49:33 pi hass[16694]: 2019-01-20 13:49:33 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: T.
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: T=11
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: T=11, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: A=OpenTherm Gateway 4.2.5
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: A=OpenTherm Gateway 4.2.5, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: V=3
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: V=3, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: C=4 MHz
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: C=4 MHz, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: R=C
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: R=C, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: P=Low power
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: P=Low power, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: O=N
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: O=N, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: W=A
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: W=A, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: G=00
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: G=00, retrying...
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: M=G
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: M=G, retrying...
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: L=FXOMPC
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: L=FXOMPC, retrying...
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: S=16.00
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: S=16.00, retrying...

Is there anything I can do to help solve the problem?

andriej commented 5 years ago

Exactly! My problems start with stopped automations or laged interface. From resource monitoring you can see that 'load' vary: image but I do not have even one process that may cause that. I run only influxdb, grafana and homeassistant.

gdschut commented 5 years ago

My Influxdb was causing a lot of memory peaks. Perhaps you can try to stop it for a while to see what happens?

andriej commented 5 years ago

Might be, but then I will be without history :-) But still, influx wasn't any problem before anyway.

andriej commented 5 years ago

Now darksky have issue to connect, with error: (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))

May it be cause by other async component?

gdschut commented 5 years ago

I'm using darksky also, but without any errors right now. What is an async component?

andriej commented 5 years ago

@gdschut async is way that components work in HA, to not block each other while performing connections

andriej commented 5 years ago

I've switched for test back to mqtt otgw gateway and after whole night no single connection error for other entities and UI (and it's restart) seem much faster. Maybe it would be worth to support this gateway via MQTT and report only values to HA? The other solution on github is supporting much less status reads.

gdschut commented 5 years ago

Thats interesting! I never had a config for OTGW with MQTT, but use MQTT already for other sensors. How to change to MQTT? For now I have inserted an automation that reloads all automations at night, hope this works for now...

gdschut commented 5 years ago

Made a move to OTGW via MQTT using TCP branch of https://github.com/jodur/py-otgw-mqtt/tree/tcp. This is a python implementation, so perhaps it is usefull for migrating tot HA? Also moved almost all of my automations to Node-Red, resulting in a very stable environment again. If there is any development on the OTGW component for HA, I am still interested, but for now I prefer a stable central heating system, without getting my wife upset about home automations... Thanks anyway for your effort.

andriej commented 5 years ago

Just getting back to component - I'm on 0.89.1 and came back here to check how things are going. :-)

mvn23 commented 5 years ago

This could be a result of connections not being cleaned up on disconnect. May be fixed by #5 so please check if the problem persists with the next version (will be released soon).

andriej commented 5 years ago

I still had problem even after fixes - less with HA, more with wifi I think - a lot of errors and reconnections. Today I've just switched both to 0.90 (which got 3 fixes found meanwhile) and to Ethernet adapter to OTGW (so no more goddamn disconnections).

Works smooth so far, will monitor.

andriej commented 5 years ago

Just an update - for me it seems to be already fix after moving from ESP-nodemcu to LAN-gateway and fixes are waiting for next HA release. Probably issue to be closed.

mvn23 commented 5 years ago

Is this still present in 0.4b2 (HA 0.90.2)?

gdschut commented 5 years ago

I m currently on MQTT communication, but hope to switch back to native OTGW as soon as I have time for it to do and also time for testing and eventualy rollback if it is not.... Right now I have other connection problems as well from other components so perhaps it would be better to wait for a stable situation...

lwestenberg commented 5 years ago

Until now I have no more OTGW issues anymore with HA 0.90.2

andriej commented 5 years ago

@gdschut those problems could be caused by unstable integrations. I had the same and worked out together with author of component few fixes that made it stable :-) Was also using MQTT earlier.

andriej commented 5 years ago

I'm closing this issue as all of the issues disappeared with both hard work with debug and code from mvn23 and change on connectivity side with fixed LAN adapter. @gdschut make sure you try the latest versions of HA while switching back to OTGW component.