home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.47k stars 29.4k forks source link

opentherm_gw via socket tcp crashes hass #20554

Closed popoviciri closed 5 years ago

popoviciri commented 5 years ago

Home Assistant release with the issue:

0.86.3

Last working Home Assistant release (if known):

Operating environment (Hass.io/Docker/Windows/etc.):

Raspbian Stretch on rpi 3B+ Linux 4.14.79-v7+ HASS install: sudo pip3 install homeassistant (not in venv) service: home-assistant@pi alongside with influxdb and grafana have a razberry hat (zwave) connected to ttyAMA0 hass is working like this for 2 years or so.

Component/platform:

https://www.home-assistant.io/components/opentherm_gw/

Description of problem: Recently I installed the otgw and connected it via wifi (esp-link) on port 23 to hass. It works! The problem is that after a number of disconnect/reconnects it stops entirely and sometimes home assistant UI crashes. Normally the backend keeps running (i know this because influxdb is still logging) but sometimes it crashes altogether. I can still log into the pi and restart hass remotely. Maybe this is a particular issue which doesn't worth fixing, which is totally fine. Not many has the OTGW hooked up like this. Because the fault happens somewhere between 24 and 48 hrs I need some help to prepare the next log. What should I look for to identify the issue?

These are the relevant bits from the last log. After the last line below, I couldn't load the hass UI but influxdb still logged everything except the OTGW. A "sudo systemctl restart home-assistant@pi" gets everything back to light.

2019-01-28 16:02:42 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 16:03:12 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 17:09:24 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 17:09:58 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 17:12:50 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 18:17:29 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 18:17:36 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: S.
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: S=16.00
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: S=16.00, retrying...
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: R=I
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: R=I, retrying...
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: T=11
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Command PS failed with PR: T=11, retrying...
2019-01-28 18:17:37 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "/home/radu/.homeassistant/deps/lib/python3.5/site-packages/pyotgw/protocol.py", line 125, in _watchdog
    await self._watchdog_cb()
  File "/home/radu/.homeassistant/deps/lib/python3.5/site-packages/pyotgw/pyotgw.py", line 93, in reconnect
    connection_timeout, inactivity_timeout)
  File "/home/radu/.homeassistant/deps/lib/python3.5/site-packages/pyotgw/pyotgw.py", line 99, in connect
    await self.get_status()
  File "/home/radu/.homeassistant/deps/lib/python3.5/site-packages/pyotgw/pyotgw.py", line 330, in get_status
    DATA_SLAVE_DHW_MAX_SETP: int(dhw_setp_bounds[0]),
ValueError: invalid literal for int() with base 10: '0.00'
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: 00000010/0100000038.10,00000011/00000011100.00,0/0,19.50,0.00,0.00,19.63,59.60,0.00,3.40,0.00,60/40,80/20,60.00,90.00,65535,65535,54632,65535,10902,14444,41066,1217
2019-01-28 18:17:37 WARNING (MainThread) [pyotgw.protocol] Command PS failed with 00000010/0100000038.10,00000011/00000011100.00,0/0,19.50,0.00,0.00,19.63,59.60,0.00,3.40,0.00,60/40,80/20,60.00,90.00,65535,65535,54632,65535,10902,14444,41066,1217, retrying...
2019-01-28 19:31:44 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-28 19:31:51 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: W.
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: W=A
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: W=A, retrying...
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: B=17:59 20-10-2015
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: B=17:59 20-10-2015, retrying...
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: V=3
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: V=3, retrying...
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: L=FXOMPC
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: L=FXOMPC, retrying...
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: A=OpenTherm Gateway 4.2.5
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: A=OpenTherm Gateway 4.2.5, retrying...
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: I=00
2019-01-28 19:31:53 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: I=00, retrying...
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: C=4 MHz
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: C=4 MHz, retrying...
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: P=Low power
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: P=Low power, retrying...
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: O=N
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: O=N, retrying...
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: M=G
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: M=G, retrying...
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: S=16.00
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: S=16.00, retrying...
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: R=I
2019-01-28 19:31:54 WARNING (MainThread) [pyotgw.protocol] Command PR failed with PR: R=I, retrying...
2019-01-28 19:31:55 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: PR: T=11
2019-01-28 19:31:55 WARNING (MainThread) [pyotgw.protocol] Command PS failed with PR: T=11, retrying...
2019-01-28 19:31:55 WARNING (MainThread) [pyotgw.protocol] Unknown message in command queue: 00000010/01000000,6.00,00000011/00000011100.00,0/0,19.50,0.00,0.00,21.80,52.90,0.00,3.30,0.00,60/40,80/20,60.00,90.00,65535,65535,54632,65535,10904,14444,41068,1218
2019-01-28 19:31:55 WARNING (MainThread) [pyotgw.protocol] Command PS failed with 00000010/01000000,6.00,00000011/00000011100.00,0/0,19.50,0.00,0.00,21.80,52.90,0.00,3.30,0.00,60/40,80/20,60.00,90.00,65535,65535,54632,65535,10904,14444,41068,1218, retrying...
2019-01-28 19:35:32 ERROR (MainThread) [pyotgw.protocol] Disconnected: None 

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

opentherm_gw:
  device: socket://esp-link:23
  climate:
    name: Thermostat
    precision: 0.5
  monitored_variables:
    - room_setpoint
    - room_temp
    - outside_temp
    - max_ch_setpoint
    - slave_flame_on
    - slave_max_relative_modulation
    - slave_fault_indication
    - slave_gas_fault
    - otgw_about

automation (set outside temperature); not relevant since this was happening before I added this:

- id: thermostat_outside_temp
  alias: thermostat outside temp
  trigger:
    platform: time_pattern
    minutes: '/30'
  action:
    service: opentherm_gw.set_outside_temperature
    data_template:
      temperature: >-
        {% if states('sensor.temperature_outdoor') != 'unknown' and
              states('sensor.temperature_outdoor') | float > -30 and
              states('sensor.temperature_outdoor') | float < 40 %}
          {{ states('sensor.temperature_outdoor') | float | round(1) }}
        {% else %}
          99
        {% endif %}

Traceback (if applicable):

Additional information:

mvn23 commented 5 years ago

Please enable debug logging for the pyotgw library:

# configuration.yaml
logger:
  default: warning
  logs:
    pyotgw.pyotgw: debug
    pyotgw.protocol: debug
Sloapert commented 5 years ago

Below the last part of my log, I have the same issue with the otgw component. I'm running hass in docker. The last line of the log is also the last line before Hass crashes.


2019-01-27 13:37:26 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value A
2019-01-27 13:37:27 INFO (MainThread) [homeassistant.components.climate] Setting up climate.opentherm_gw
2019-01-27 13:37:27 DEBUG (MainThread) [homeassistant.components.climate.opentherm_gw] Added device Thermostaat
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: A=OpenTherm Gateway 4.2.5
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value B
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: B=17:59 20-10-2015
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value C
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: C=4 MHz
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value G
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: G=00
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value I
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: I=00
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value L
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: L=FXOMPC
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value M
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: M=G
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value O
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: O=T21.00
2019-01-27 13:37:27 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value P
2019-01-27 13:37:28 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: P=Low power
2019-01-27 13:37:28 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value R
2019-01-27 13:37:31 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: R.
2019-01-27 13:37:31 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value S
2019-01-27 13:37:34 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: S.
2019-01-27 13:37:34 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value T
2019-01-27 13:37:34 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: T=11
2019-01-27 13:37:34 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value V
2019-01-27 13:37:37 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: V.
2019-01-27 13:37:37 DEBUG (MainThread) [pyotgw.protocol] Sending command: PR with value W
2019-01-27 13:37:37 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PR: PR: W=A
2019-01-27 13:37:37 DEBUG (MainThread) [pyotgw.protocol] Sending command: PS with value 1
2019-01-27 13:37:40 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PS, value: 1.
2019-01-27 13:37:40 DEBUG (MainThread) [homeassistant.components.opentherm_gw] Connected to OpenTherm Gateway at socket://otgw.abcd.ab:6638
2019-01-27 13:37:40 DEBUG (MainThread) [pyotgw.protocol] Sending command: PS with value 0
2019-01-27 13:37:40 DEBUG (MainThread) [pyotgw.protocol] Got possible response for command PS: PS: 0
2019-01-27 14:46:54 DEBUG (MainThread) [pyotgw.protocol] Watchdog triggered!
2019-01-27 14:46:54 DEBUG (MainThread) [pyotgw.protocol] Canceling Watchdog task.
2019-01-27 14:46:54 DEBUG (MainThread) [pyotgw.pyotgw] Scheduling reconnect...
2019-01-27 14:46:54 DEBUG (MainThread) [pyotgw.pyotgw] Reconnecting to serial device on socket://otgw.abcd.ab:6638
2019-01-27 14:46:54 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
'''
popoviciri commented 5 years ago

OK, I've been running for two days with debug logging the components and HASS still didn't crash. I get the occasional

2019-01-30 15:12:53 DEBUG (MainThread) [pyotgw.protocol] Watchdog triggered!
2019-01-30 15:12:53 DEBUG (MainThread) [pyotgw.protocol] Canceling Watchdog task.
2019-01-30 15:12:53 DEBUG (MainThread) [pyotgw.pyotgw] Scheduling reconnect...
2019-01-30 15:12:53 DEBUG (MainThread) [pyotgw.pyotgw] Reconnecting to serial device on socket://192.168.1.175:23
2019-01-30 15:12:53 ERROR (MainThread) [pyotgw.protocol] Disconnected: None
2019-01-30 15:12:56 DEBUG (MainThread) [pyotgw.pyotgw] Connected to serial device on socket://192.168.1.175:23

but everything runs smooth. Puzzling. I'll report back in a couple of days or when HASS crashes. Cheers

popoviciri commented 5 years ago

4 days without a hitch now. I'm closing this, though it did bother me for couple of weeks. It just won't crash again. I'll start using it in more advanced automations. Thanks for the opentherm_gw component! Cheers!