home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.03k stars 31.06k forks source link

OpenTherm gateway: unhandled error #22074

Closed andriej closed 5 years ago

andriej commented 5 years ago

Home Assistant release with the issue: 0.89.2 with pyotgw 0.4b2 Last working Home Assistant release (if known): n/a

Operating environment (Hass.io/Docker/Windows/etc.): Linux / venv / python 3.7.2 Component/platform: opentherm gateway

Description of problem: Error shown in log, component failed to work further - reported values stayed as for the moment of error (so I can see boiler working non-stop even tho it's off since at least 12h)

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant): not relevant, socket to nodemcu with espeasy as serial server

Traceback (if applicable):

Unhandled error in exception handler
context: {'message': 'Task exception was never retrieved', 'exception': RecursionError('maximum recursion depth exceeded while calling a Python object'), 'future': <Task finished coro=<protocol._watchdog() done, defined at /srv/homeassistant/lib/python3.7/site-packages/pyotgw/protocol.py:120> exception=RecursionError('maximum recursion depth exceeded while calling a Python object')>}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 1643, in call_exception_handler
    self._exception_handler(self, context)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/core.py", line 110, in async_loop_exception_handler
    "Error doing job: %s", context['message'], **kwargs)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1412, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1519, in _log
    self.handle(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1529, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1591, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 905, in handle
    self.emit(record)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/system_log/__init__.py", line 168, in emit
    _figure_out_source(record, stack, self.hass))
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/system_log/__init__.py", line 98, in __init__
    traceback.format_exception(*record.exc_info))
  File "/usr/local/lib/python3.7/traceback.py", line 121, in format_exception
    type(value), value, tb, limit=limit).format(chain=chain))
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  File "/usr/local/lib/python3.7/traceback.py", line 497, in __init__
    _seen=_seen)
  [Previous line repeated 486 more times]
  File "/usr/local/lib/python3.7/traceback.py", line 473, in __init__
    _seen.add(id(exc_value))
RecursionError: maximum recursion depth exceeded while calling a Python object

Additional information: pyotgw is from latest version which should have 'reconnection issue' fixed - 0.4b2 @mvn23

andriej commented 5 years ago

It could be fault of some connection issue with the gateway being unavailable, but right now even restarting whole HA cause it's being unavailable: 2019-03-15 14:50:37 ERROR (MainThread) [pyotgw.pyotgw] Could not connect to serial device on socket://192.168.x.x:xxxx. Will keep trying. Reported error was: Could not open port socket://192.168.x.x:xxxx: [Errno 113] No route to host

mvn23 commented 5 years ago

Did you have a lot of disconnect errors between HA start and this problem occuring?

andriej commented 5 years ago

Yes - sort of. Before the fix some ofthe disconnection happened to freeze HA totally. After the fix they seem to get receonnect and work properly, but still it's nodemcu serial server, so WiFi and possible connection dropouts.

mvn23 commented 5 years ago

Trying to reproduce it here by stressing the watchdog/reconnect logic, but so far without success. Any further details about the connection drops (frequency, duration) and/or a debug log could be helpful.

andriej commented 5 years ago

The thing is that when that error occured - probably the nodemcu otgw itself somehow hanged/lost connection. But what's worse, HA didn't success to start until that nodemcu got hard-restarted (power off/power on). So basically when it occured, HA waited when startup until the restart - without that I've tried to restart HA itself and no successs.

(totally in meanwhile I wait for ethernet module for OTGW to arrive then I'll do tests over LAN, but it's still few weeks before delivery tho...)

In logfile, just after the restart of otgw gateway - after this message HA finished to startup (after loooong wait from my side):

Mar 15 16:00:49 hass[14329]: 2019-03-15 16:00:49 WARNING (MainThread) [pyotgw.protocol] Command PS failed with Low power, retrying...
Mar 15 16:00:49 hass[14329]: 2019-03-15 16:00:49 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: Low power
mvn23 commented 5 years ago

The thing is that when that error occured - probably the nodemcu otgw itself somehow hanged/lost connection. But what's worse, HA didn't success to start until that nodemcu got hard-restarted (power off/power on).

I see the same behavior indeed. If the gateway is unreachable when HA starts, HA's interface will not be loaded until the connection to the gateway is made. During this time, I do see other activity in the debug log so it's not completely blocked. It seems to be related to how the connecting routine is scheduled, looking into that now.

andriej commented 5 years ago

Great! I hope to help finding the bug and making component even better :-)

andriej commented 5 years ago

Seems fixed with #22121 and #22106. Will test next release and reopen if still an issue.