Closed andriej closed 5 years ago
I think I have the same problem. See also here: https://community.home-assistant.io/t/help-automation-stops-working-after-hass-running-for-a-day-and-returns-to-normal-working-after-reloading-automations/92571 Today I found out I had a pyotgw crash/connection lost about 15 minutes before some of my automations do not work any longer. (of course not sure about any coincidence) As I monitor memory usage on my raspberry pi I dont think this is the problem. memory usage is real stable over time, at least in the amount of memory used. I did a reinstall of everything on my rpi this weekend, so everything is clean again, but before I had a lot of errors on influxdb and the recorder about not getting write access to the db. Since my reinstall the only error in the logfile is from pyotgw....
Jan 20 13:49:25 pi hass[16694]: 2019-01-20 13:49:25 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
Jan 20 13:49:33 pi hass[16694]: 2019-01-20 13:49:33 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: T.
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: T=11
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: T=11, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: A=OpenTherm Gateway 4.2.5
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: A=OpenTherm Gateway 4.2.5, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: V=3
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: V=3, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: C=4 MHz
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: C=4 MHz, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: R=C
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: R=C, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: P=Low power
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: P=Low power, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: O=N
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: O=N, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: W=A
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: W=A, retrying...
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: G=00
Jan 20 13:49:34 pi hass[16694]: 2019-01-20 13:49:34 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: G=00, retrying...
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: M=G
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: M=G, retrying...
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: L=FXOMPC
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: L=FXOMPC, retrying...
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: S=16.00
Jan 20 13:49:35 pi hass[16694]: 2019-01-20 13:49:35 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: S=16.00, retrying...
Is there anything I can do to help solve the problem?
Exactly! My problems start with stopped automations or laged interface. From resource monitoring you can see that 'load' vary: but I do not have even one process that may cause that. I run only influxdb, grafana and homeassistant.
My Influxdb was causing a lot of memory peaks. Perhaps you can try to stop it for a while to see what happens?
Might be, but then I will be without history :-) But still, influx wasn't any problem before anyway.
Now darksky have issue to connect, with error:
(Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))
May it be cause by other async component?
I'm using darksky also, but without any errors right now. What is an async component?
@gdschut async is way that components work in HA, to not block each other while performing connections
I've switched for test back to mqtt otgw gateway and after whole night no single connection error for other entities and UI (and it's restart) seem much faster. Maybe it would be worth to support this gateway via MQTT and report only values to HA? The other solution on github is supporting much less status reads.
Thats interesting! I never had a config for OTGW with MQTT, but use MQTT already for other sensors. How to change to MQTT? For now I have inserted an automation that reloads all automations at night, hope this works for now...
Made a move to OTGW via MQTT using TCP branch of https://github.com/jodur/py-otgw-mqtt/tree/tcp. This is a python implementation, so perhaps it is usefull for migrating tot HA? Also moved almost all of my automations to Node-Red, resulting in a very stable environment again. If there is any development on the OTGW component for HA, I am still interested, but for now I prefer a stable central heating system, without getting my wife upset about home automations... Thanks anyway for your effort.
Just getting back to component - I'm on 0.89.1 and came back here to check how things are going. :-)
This could be a result of connections not being cleaned up on disconnect. May be fixed by #5 so please check if the problem persists with the next version (will be released soon).
I still had problem even after fixes - less with HA, more with wifi I think - a lot of errors and reconnections. Today I've just switched both to 0.90 (which got 3 fixes found meanwhile) and to Ethernet adapter to OTGW (so no more goddamn disconnections).
Works smooth so far, will monitor.
Just an update - for me it seems to be already fix after moving from ESP-nodemcu to LAN-gateway and fixes are waiting for next HA release. Probably issue to be closed.
Is this still present in 0.4b2 (HA 0.90.2)?
I m currently on MQTT communication, but hope to switch back to native OTGW as soon as I have time for it to do and also time for testing and eventualy rollback if it is not.... Right now I have other connection problems as well from other components so perhaps it would be better to wait for a stable situation...
Until now I have no more OTGW issues anymore with HA 0.90.2
@gdschut those problems could be caused by unstable integrations. I had the same and worked out together with author of component few fixes that made it stable :-) Was also using MQTT earlier.
I'm closing this issue as all of the issues disappeared with both hard work with debug and code from mvn23 and change on connectivity side with fixed LAN adapter. @gdschut make sure you try the latest versions of HA while switching back to OTGW component.
Since I use native HA integration - after latest fix - it works (I mean it reconnects and I'm able to control my thermostat) but after running for many hours HA seems to get more and more unstable and eventually loosing some socket integrations like Xiaomi Aqara.
Is there possibility that there's some memory leak or socket's limit is getting eaten by component? I'd like to help, just let me know how can I participate.