home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.11k stars 29.79k forks source link

Opentherm Gateway component not working #17263

Closed Wummeke closed 5 years ago

Wummeke commented 5 years ago

Home Assistant release with the issue: 0.79.3

Last working Home Assistant release (if known): None

Operating environment (Hass.io/Docker/Windows/etc.): I am running Home Assistant from Docker

Component/platform: https://www.home-assistant.io/components/climate.opentherm_gw/

Description of problem: Component does not work basicly. I sometimes have to reboot Home Assistant a few times to get the component going:

2018-10-08 22:23:24 WARNING (MainThread) [homeassistant.components.climate] Setup of platform opentherm_gw is taking over 10 seconds.
2018-10-08 22:23:30 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/app/homeassistant/helpers/entity_platform.py", line 345, in _async_add_entity
    await entity.async_added_to_hass()
  File "/usr/src/app/homeassistant/components/climate/opentherm_gw.py", line 66, in async_added_to_hass
    await self.gateway.connect(self.hass.loop, self._device)
  File "/usr/local/lib/python3.6/site-packages/pyotgw/pyotgw.py", line 55, in connect
    await self.get_reports()
  File "/usr/local/lib/python3.6/site-packages/pyotgw/pyotgw.py", line 170, in get_reports
    OTGW_DEFAULT_TIMEOUT, loop=self.loop)
  File "/usr/local/lib/python3.6/asyncio/tasks.py", line 362, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

and When it finally works, it stops working after a few minutes. When I try to change the temperature an error is added to the log:

2018-10-08 22:13:51 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/app/homeassistant/helpers/service.py", line 226, in _handle_service_platform_call
    await func(entity, data)
  File "/usr/src/app/homeassistant/components/climate/__init__.py", line 679, in async_service_temperature_set
    await entity.async_set_temperature(**kwargs)
  File "/usr/src/app/homeassistant/components/climate/opentherm_gw.py", line 174, in async_set_temperature
    temp)
  File "/usr/local/lib/python3.6/site-packages/pyotgw/pyotgw.py", line 91, in set_target_temp
    self._protocol.issue_cmd(cmd, value), timeout, loop=self.loop))
  File "/usr/local/lib/python3.6/asyncio/tasks.py", line 362, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

My openthermgateway is connected to a nodeMCU with ESP Easy running ser2net.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

climate:
  - platform: opentherm_gw
    device: socket://192.168.0.6:6638
    name: Thermostat
    precision: 0.5

Additional information: Here a log of a 'successful' connection after reboot and then almost imediately a crash of the component when I try to set a temperature:

2018-10-08 22:29:12 INFO (MainThread) [homeassistant.components.climate] Setting up climate.opentherm_gw
2018-10-08 22:29:17 DEBUG (MainThread) [homeassistant.components.climate.opentherm_gw] Connected to Thermostat on socket://192.168.0.6:6638
2018-10-08 22:29:17 DEBUG (MainThread) [homeassistant.components.climate.opentherm_gw] Received report: {'control_setpoint': 10.0, 'otgw_about': 'OpenTherm Gateway 4.2.5', 'otgw_build': '17:59 20-10-2015', 'otgw_clockmhz': '4 MHz', 'otgw_gpio_a': 0, 'otgw_gpio_b': 0, 'otgw_gpio_a_state': 0, 'otgw_gpio_b_state': 0, 'otgw_led_a': 'F', 'otgw_led_b': 'X', 'otgw_led_c': 'O', 'otgw_led_d': 'T', 'otgw_led_e': 'P', 'otgw_led_f': 'C', 'otgw_mode': 'G', 'otgw_setpoint_ovrd_mode': 'N', 'otgw_smart_pwr': 'Low power', 'otgw_thermostat_detect': 'D', 'otgw_setback_temp': '16.00', 'otgw_ignore_transitions': 1, 'otgw_ovrd_high_byte': 1, 'otgw_vref': 3, 'otgw_dhw_ovrd': 'A', 'master_ch_enabled': 0, 'master_dhw_enabled': 1, 'master_cooling_enabled': 0, 'master_otc_enabled': 0, 'master_ch2_enabled': 0, 'slave_fault_indication': 0, 'slave_ch_active': 0, 'slave_dhw_active': 0, 'slave_flame_on': 0, 'slave_cooling_active': 0, 'slave_ch2_active': 0, 'slave_diagnostic_indication': 0, 'remote_transfer_dhw': 0, 'remote_transfer_max_ch': 0, 'remote_rw_dhw': 0, 'remote_rw_max_ch': 0, 'slave_max_relative_modulation': 0.0, 'slave_max_capacity': 0, 'slave_min_mod_level': 0, 'room_setpoint': 0.0, 'relative_mod_level': 0.0, 'ch_water_pressure': 0.0, 'room_temp': 0.0, 'ch_water_temp': 0.0, 'dhw_temp': 0.0, 'outside_temp': 0.0, 'return_water_temp': 0.0, 'slave_dhw_max_setp': 0, 'slave_dhw_min_setp': 0, 'slave_ch_max_setp': 0, 'slave_ch_min_setp': 0, 'dhw_setpoint': 0.0, 'max_ch_setpoint': 90.0, 'ch_burner_starts': 0, 'ch_pump_starts': 0, 'dhw_pump_starts': 0, 'dhw_burner_starts': 0, 'ch_burner_hours': 0, 'ch_pump_hours': 0, 'dhw_pump_hours': 0, 'dhw_burner_hours': 0, 'date': datetime.datetime(2018, 10, 8, 22, 29, 17, 661521)}
2018-10-08 22:30:01 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/app/homeassistant/helpers/service.py", line 226, in _handle_service_platform_call
    await func(entity, data)
  File "/usr/src/app/homeassistant/components/climate/__init__.py", line 679, in async_service_temperature_set
    await entity.async_set_temperature(**kwargs)
  File "/usr/src/app/homeassistant/components/climate/opentherm_gw.py", line 174, in async_set_temperature
    temp)
  File "/usr/local/lib/python3.6/site-packages/pyotgw/pyotgw.py", line 91, in set_target_temp
    self._protocol.issue_cmd(cmd, value), timeout, loop=self.loop))
  File "/usr/local/lib/python3.6/asyncio/tasks.py", line 362, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
joep2000 commented 5 years ago

Hi all, I've updated my Hass in Docker to version 0.82.0 and now there is no (stable) connection to my OTGW anymore. Last working version for me was 0.81.6.

my OpenTherm Monitor still receives the stream fine over telnet connection.

dev-info output: Timed out waiting for command: PR, value: I. Are you connecting to the OpenTherm Gateway? 10:54 util/async_.py (ERROR) Unknown message in command queue: OpenTherm Gateway 4.2.5 10:54 util/async_.py (WARNING)

joep2000 commented 5 years ago

To follow up, I believe it is something within HA... The OT Monitor log shows 'PR: I=11' every ten seconds or so. This indicates that HA is sending commands to the gateway and the gateway in turn receives and acknowledges the request. However, HA does not receive or handles the messages.

mvn23 commented 5 years ago

@joep2000 your log indicates that your gateway has rebooted unexpectedly while HA was waiting for a response to a PR=I command. Nothing I can do about that. The periodical PR: I=11 messages you're seeing are there because your GPIO pins have been configured with a function other than 0. Polling is the only way to keep an eye on their state, but I may optimize this in a future version (no polling if configured always high/low). If you don't see related errors in HA's log this is expected behavior and the messages are being handled.

andriej commented 5 years ago

@mvn23: how come that adding debug level to logger and restarting HASS made it to work? it's working now, TT changes the thermostat value and the only 'errors' in log I can see are: Unknown message in command queue: Medium power Unknown message in command queue: Low power

But it's working. Strange as hell, as I didn't do anything new...

gdschut commented 5 years ago

I guess it is a matter of synchronisation. Sometimes when disconnected I can restore the connection by restarting HASS, and sometimes by restarting the OTGW/ESP combination.

andriej commented 5 years ago

Just quick update: I've restarted HASS dozens of times (reconfiguration to OT component from external-MQTT). Not a problem even once so far.

thefoxmangit commented 5 years ago

I've meanwhile updated to 0.82, and migrated from ResinOS to HassOS. Reporting seems to be stable now, until the OTGW/NodeMCU reboots (the watchdog auto-reboots it from time to time), after which the connection is lost and a restart of home assistant is required.

andriej commented 5 years ago

I didn't try rebooting otgw/nodemcu itself yet...

mvn23 commented 5 years ago

the OTGW/NodeMCU reboots (the watchdog auto-reboots it from time to time), after which the connection is lost and a restart of home assistant is required.

I'm working on an auto-reconnect on inactivity feature, but it's difficult to test on a stable connection while I'm away from home (happens quite a lot). Can't you disable the watchdog in the mean time? Neither the gateway nor the NodeMCU should not require any reboots...

andriej commented 5 years ago

@mvn23 thing with TC vs TT - seems to show up after I changed setpoint on thermostat, it's not relevant in UI anymore (not updated), even tho boiler started working and it's status (heating) is proper in HASS

mvn23 commented 5 years ago

@mvn23 thing with TC vs TT - seems to show up after I changed setpoint on thermostat, it's not relevant in UI anymore (not updated), even tho boiler started working and it's status (heating) is proper in HASS

That bug was fixed in #18286

hjbout commented 5 years ago

Even from otmonitor I often mis a response for given commands. So i doubt that this is a HA issue. It looks as if I time my command between two "data-bursts" the response comes back. As if commands not getting queued or something. Just a noob observation ;)

andriej commented 5 years ago

Ok, now after few days and restarting HASS it 'lost' connectivity to (not restarted) OTGW.

mvn23 commented 5 years ago

It looks as if I time my command between two "data-bursts" the response comes back.

This is what I am suspecting as well, that transmitting and receiving at the same time causes problems. I took some time to implement retry/reconnect logic in the library to work around this. New code can be found at the retry_reconnect branch of pyotgw, so feel free to test. If it's a success I will push an update soon.

hjbout commented 5 years ago

so feel free to test. If it's a success I will push an update soon.

I definitely want to but I’m not sure if I have the opportunity coming days/weekend. If I do, I’ll let you know. Anyway, great that you took the effort to tackle this issue. Appreciate it very much!

gdschut commented 5 years ago

What is the way to test? Copy the pyotgw.py, protocol.py and vars.py from github over the existing ones in site-packages/pyotgw folder, and restart HA?

mvn23 commented 5 years ago

That's what I do when testing for longer periods.

gdschut commented 5 years ago

Seems like it works. Got this in my logfile:

Dec 07 02:13:24 pi hass[31520]: 2018-12-07 02:13:24 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: T. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:31 pi hass[31520]: 2018-12-07 02:13:31 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: W. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:37 pi hass[31520]: 2018-12-07 02:13:37 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: I. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:40 pi hass[31520]: 2018-12-07 02:13:40 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: O. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:40 pi hass[31520]: 2018-12-07 02:13:40 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: A. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:43 pi hass[31520]: 2018-12-07 02:13:43 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: S. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:45 pi hass[31520]: 2018-12-07 02:13:45 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: L. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:46 pi hass[31520]: 2018-12-07 02:13:46 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: W. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:48 pi hass[31520]: 2018-12-07 02:13:48 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: P. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:49 pi hass[31520]: 2018-12-07 02:13:49 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: I. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:54 pi hass[31520]: 2018-12-07 02:13:54 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: R. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:54 pi hass[31520]: 2018-12-07 02:13:54 ERROR (MainThread) [pyotgw.pyotgw] Timed out waiting for command: PR, value: O. Are you connecting to the OpenTherm Gateway?
Dec 07 02:13:56 pi hass[31520]: 2018-12-07 02:13:55 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Dec 07 02:13:56 pi hass[31520]: Traceback (most recent call last):
Dec 07 02:13:56 pi hass[31520]:   File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
Dec 07 02:13:56 pi hass[31520]:     result = coro.send(None)
Dec 07 02:13:56 pi hass[31520]:   File "/home/pi/opt/homeassistant/lib/python3.5/site-packages/pyotgw/pyotgw.py", line 98, in connect
Dec 07 02:13:56 pi hass[31520]:     await self.get_reports()
Dec 07 02:13:56 pi hass[31520]:   File "/home/pi/opt/homeassistant/lib/python3.5/site-packages/pyotgw/pyotgw.py", line 270, in get_reports
Dec 07 02:13:56 pi hass[31520]:     status.update({OTGW_SB_TEMP: float(sb_temp)})
Dec 07 02:13:56 pi hass[31520]: ValueError: could not convert string to float: 'OpenTherm Gateway 4.2.5'

But OTGW is still connected!

mvn23 commented 5 years ago

That's a very strange error to say the least. It may indicate an unexpected reset of the gateway. I added some debug logging to the retry_reconnect branch. Could you update and enable debug logging for the library?

logger:
  default: warning
  logs:
    pyotgw.pyotgw: debug
    pyotgw.protocol: debug
andriej commented 5 years ago

I'm testing retry_reconnect too. First observation - it was able to 'reconnect' itself to proper data, when normal restarts of HASS didn't help. Will see how it goes further.

I also really appreciate your work. Would like to have it stable as controller. :-)

gdschut commented 5 years ago

Thanks for the update! Now it makes it hard to recognize when there was a connectionproblem, due to the huge amount of debug logs and the system reconnects itself... How can I find if and when the problem has occured?

mvn23 commented 5 years ago

It will throw one or more WARNING messages in the log if a command has to be resent due to an erroneous response, or the usual TimeOut if it gets no response at all. If it detects inactivity on the connection, it will throw DEBUG messages like Reconnecting to serial device on <path>. I just realized maybe it should throw a WARNING there as well... It will throw an ERROR if it fails to connect.

andriej commented 5 years ago

What showed in log (no debug tho). It looks like it managed to reconnect, as it seems to still work.


ERROR:
Could not connect to serial device on socket://x.x.x.x:xxxx. Will keep trying. Reported error was: Could not open port socket://x.x.x.x:xxx: timed out

ERROR:
Error doing job: Task was destroyed but it is pending!
hjbout commented 5 years ago

Ok, finally have some time to try this, but... Don't know how to copy the 3 files to hassio. Tried to replace the files in /usr/local/lib/python3.6/site-packages/pyotgw while in the container. But after a restart I belief they have been replaced with the original ones. If I do a find from outside the container I see 3 occurences of the files. Twice with a path like /var/lib/docker/overlay2/3d06......8ab9518/diff/usr/local/lib/python3.6/site-packages/pyotgw/ and once with /var/lib/docker/overlay2/3d06......8ab9518/merged/usr/local/lib/python3.6/site-packages/pyotgw. Do I replace any of those files? Anyone care to learn me something ;)

andriej commented 5 years ago

I'm testing it for few days already. Is it possible that while it has connection/reconnection issues, the component makes whole async overloaded with task and other components may also have problems? I think it's causing some kind of hiccups of performance or other sensors refreshing issues which I didn't have before (like darksky or synologydsm on local network)

mvn23 commented 5 years ago

Is it possible that while it has connection/reconnection issues, the component makes whole async overloaded with task and other components may also have problems?

I'll look into this soon.

gdschut commented 5 years ago

I am running with the debug version for a while now:

Some are relating to times when I was restarting HASS, but others did not.

So what other information do you like to get?

mvn23 commented 5 years ago

Any warnings, errors and all output from pyotgw would be great. That will be a lot of data, so please use hastebin or similar.

gdschut commented 5 years ago

I do not like to share my complete log on the internet, but no problem to send it you. Are you the same mvn23 at Tweakers? I can send you a pm over there.

mvn23 commented 5 years ago

Are you the same mvn23 at Tweakers?

The very same.

mvn23 commented 5 years ago

May be reopened due to a bug in the updated lib. Working on a patch.