home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.62k stars 29.93k forks source link

deconz devices becoming unavailable #37372

Closed aguilaair closed 3 years ago

aguilaair commented 4 years ago

The problem

My ZigBee devices connected to deConz are becoming constantly unavailable without any apparent reason. I have multiple brands including Phillips and Aqara. They're all bettery-run so none are acting as repeaters.

Environment

docker_version | 19.03.8 host_os | HassOS 4.10 os_version | 4.19.126-v8 python_version | 3.7.7 supervisor | 228 version | 0.112.0 frontend | 20200603.3 - latest deconz | 5.3.6

Logs

Nothing shows up in the logs when this happens.

Images

imagen imagen imagen

aguilaair commented 4 years ago

Duplicate of https://github.com/home-assistant/hassio-addons/issues/1438, wasn't quite sure which repo to post it on (sorry)

Kane610 commented 4 years ago

Most likely not related to the integration

aguilaair commented 4 years ago

They worked fine a few days ago... I'll continue investigating, thanks!

ASNNetworks commented 4 years ago

I upgraded to latest Deconz just now and have the exact same issue. Restarted Deconz an HA multiple times, all Zigbee devices just disconnect after a while (looks as if it crashes). Gonna restore to a snapshow as we speak...

I do remember Deconz having this exact same issue a few versions ago. It happened multiple times a day. An update fixed that then.

aguilaair commented 4 years ago

If it works please let me know, it's driving me crazy 🤣

probot-home-assistant[bot] commented 4 years ago

deconz documentation deconz source (message by IssueLinks)

Fusionalex commented 4 years ago

I have exactly the same problem since the latest Deconz update...

Grodesh commented 4 years ago

FWIW, I was having this problem since the latest update, but upon closer inspection of the logs, I saw that a firmware update was available for my Conbee II. All my devices are stable once again with the new firmware.

Fusionalex commented 4 years ago

For me the problem still exists sadly, I already have the latest firmware for my Conbee (not II)

FirinBlanks commented 4 years ago

I’ve tried updating the firmware and rebooting but each time I go back in to deconz it tells me the same firmware update is available. I can confirm that this issue happened prior to the HA core update but I’ve also now updated to the latest core in the hope that it was an incompatibility issue, I can sadly confirm that it wasn’t.

aguilaair commented 4 years ago

I've just updated the firmware from the Windows 10 application. If the issue is fixed I'll let you guys know

aguilaair commented 4 years ago

Well, that did not seem to help

aguilaair commented 4 years ago

For the time being I've migrated to ZHA, once this is sorted I'll probably move back

ASNNetworks commented 4 years ago

If it works please let me know, it's driving me crazy 🤣

I restored my snapshot (and that way downgraded Deconz) I did have some issues with losing connections after that still. Then rebooted the hardware (OS, so not just restart HA server). After that seems stable again like before (no lost connections for 48 hours).

Kane610 commented 4 years ago

By account of previous comments this is related to latest deconz release. Closing this issue since it is not related to the integration per se.

aguilaair commented 4 years ago

Where should this issue be posted then?

Fusionalex commented 4 years ago

I dont get where else this should be posted then either.. The issue in the hassio-addons was closed too?

Kane610 commented 4 years ago

Deconz github if it's related to an upgrade of deconz.

aguilaair commented 4 years ago

They told me it's related to HA, running it standalone has no issues

ASNNetworks commented 4 years ago

It is definitely related to HA in some way. I downgraded Deconz to 5.3.2 by restoring HA. I noticed a lot of issues still after that, after that I rebooted the OS entirely (not just HA Server) and also upgraded to 112. Zigbee is more stable now than with Deconz update, BUT all my devices still become unavailable for 15 seconds after a certain amount of time (sometimes after an hour, sometimes after 12 hours, very random). Unfortunately sometimes this happens right when I want to trigger something (which is rare, but still unreliable).

I had this exact issues a few HA versions ago. To say it's entirely Deconz is not correct, since I reverted the update.

Kane610 commented 4 years ago

Ok then. Share logs showcasing the issue

aguilaair commented 4 years ago

That's the thing, the logs don't really give that much info: HA Log: 2020-07-02 23:25:30 DEBUG (MainThread) [pydeconz.websocket] {"attr":{"lastannounced":null,"lastseen":"2020-07-02T21:25:30Z","manufacturername":"dresden elektronik","modelid":"RaspBee","name":"Configuration tool 1","swversion":"0x264a0700","type":"Configuration tool","uniqueid":"00:21:2e:ff:ff:05:d7:cf-01"},"e":"changed","id":"1","r":"lights","t":"event","uniqueid":"00:21:2e:ff:ff:05:d7:cf-01"} 2020-07-02 23:25:50 INFO (MainThread) [volvooncall.volvooncall] Updating 2020-07-02 23:25:54 DEBUG (MainThread) [pydeconz.websocket] Websocket starting 2020-07-02 23:25:54 DEBUG (MainThread) [pydeconz.websocket] Reconnecting to deCONZ in 15. 2020-07-02 23:26:09 DEBUG (MainThread) [pydeconz.websocket] Websocket starting 2020-07-02 23:26:09 DEBUG (MainThread) [pydeconz.websocket] Websocket running 2020-07-02 23:26:09 DEBUG (MainThread) [pydeconz.websocket] {"attr":{"lastannounced":null,"lastseen":"2020-07-02T21:26:09Z","manufacturername":"dresden elektronik","modelid":"RaspBee","name":"Configuration tool 1","swversion":"0x264a0700","type":"Configuration tool","uniqueid":"00:21:2e:ff:ff:05:d7:cf-01"},"e":"changed","id":"1","r":"lights","t":"event","uniqueid":"00:21:2e:ff:ff:05:d7:cf-01"}

Deconz log: 15:16:38:909 APS-DATA.indication request id: 188 -> finished 15:16:38:909 APS-DATA.request id: 188 erase from queue 15:16:38:909 ZDP status = 0x00 -> SUCCESS 15:16:38:910 ZDP Mgmt_Lqi_rsp zdpSeq: 45 from 0x00212EFFFF05D7CF total: 5, startIndex: 4, listCount: 1 15:16:38:910 * neighbor: 0x00158D00045D44DE (0x0B66), LQI: 255, relation: 0x01 rxOnWHenIdle: 0 =======AT 15:07:02 devices became unreachable===== 15:17:10:432 don't close database yet, keep open for 900 seconds 15:17:10:434 Remove websocket 172.30.32.1:51086 after error Unknown error 172.30.32.2 - - [03/Jul/2020:15:17:10 +0200] "GET /api/2C916F87F2/groups?_=1593778553416 HTTP/1.1" 200 1155 "https://rlc616egup6ikom710nd2uv6k3qg9m56.ui.nabu.casa/api/hassio_ingress/Y6CjyPLqCkHCWJS8gU3lwF1Iw86AR-NMdch8hfC_JaM/pwa/index.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0" 15:17:10:439 APS-DATA.indication srcAddr: 0x27e9, srcEp: 0x02 dstAddrMode: 2, profile: 0x0104, cluster: 0x0001, lqi: 111, rssi: -80 15:17:10:440 no button map for: RWL021 ep: 0x02 cl: 0x0001 cmd: 0x0A pl[0]: 021 15:17:10:440 ZCL attribute report 0x00178801080CDF75 for cluster: 0x0001, ep: 0x02, frame control: 0x18, mfcode: 0x0000

https://paste.ubuntu.com/p/jmBpXkNd8y/

Kane610 commented 4 years ago

The deconz log is unnecessary, the part from the ha log you shared only show the raspbee device, which isn't relevant more than to show that I should probably ignore it completely. New logs please :)

ASNNetworks commented 4 years ago

At the exact time that all Zigbee devices were unavailable, I saw these lines. Perhaps something else causing stuff to hang, which then disconnects HA to Deconz? Configuration files are all ok though... Very strange.

2020-07-08 16:37:38 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.cpu_temperatuur is taking over 10 seconds 2020-07-08 16:37:43 ERROR (SyncWorker_7) [homeassistant.components.command_line.sensor] Timeout for command: cat /sys/class/thermal/thermal_zone0/temp

Kane610 commented 4 years ago

Disable those and try again

ASNNetworks commented 4 years ago

Disable those and try again

Ok, I will keep an eye to the history now that I removed that. I also noticed my iPhone was registered multiple times as a device. At the same time that I got another short disconnect (after removing previous sensor lines), I saw that error in log. Removed those as well.

ASNNetworks commented 4 years ago

So once again the devices had a brief moment of unavailable. I see no logs at that exact time. The logs 3 minutes before all devices became unavailable were the following three logs. So I'm not sure this has anything to do with it. Just to clarify, I have deleted home-assistant_v2.db multiple times now (I saw recorder, so figured perhaps corrupt database). The last time I deleted the database was around 24 hours ago.

FIRST ONE Logger: homeassistant.components.recorder.util Source: components/recorder/util.py:92 Integration: Recorder (documentation, issues) First occurred: 19:34:09 (78 occurrences) Last logged: 20:51:36

Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 18:39:37.044000', 'sensor.load_1m', '2020-07-08 18:40:07.928000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-07 18:51:08.849000', 'light.pc', '2020-07-08 18:51:08.849000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 18:51:08.849000', 'light.pc', '2020-07-08 18:51:12.496000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-07 18:51:16.453000', 'light.pc', '2020-07-08 18:51:16.453000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 18:50:29.880000', 'media_player.multispeakers_cast', '2020-07-08 18:51:29.890000')] (Background on this error at: http://sqlalche.me/e/e3q8)

SECOND ONE Logger: homeassistant.components.recorder.util Source: components/recorder/util.py:32 Integration: Recorder (documentation, issues) First occurred: 19:34:09 (26 occurrences) Last logged: 20:51:36

Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 18:39:37.044000', 'sensor.load_1m', '2020-07-08 18:40:07.928000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-07 18:51:08.849000', 'light.pc', '2020-07-08 18:51:08.849000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 18:51:08.849000', 'light.pc', '2020-07-08 18:51:12.496000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-07 18:51:16.453000', 'light.pc', '2020-07-08 18:51:16.453000')] (Background on this error at: http://sqlalche.me/e/e3q8) Error executing query: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 18:50:29.880000', 'media_player.multispeakers_cast', '2020-07-08 18:51:29.890000')] (Background on this error at: http://sqlalche.me/e/e3q8)

THIRD ONE Logger: aiohttp.server Source: components/recorder/util.py:75 First occurred: 19:34:09 (22 occurrences) Last logged: 20:51:36

Error handling request Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 418, in start resp = await task File "/usr/local/lib/python3.7/site-packages/aiohttp/web_app.py", line 458, in _handle resp = await handler(request) File "/usr/local/lib/python3.7/site-packages/aiohttp/web_middlewares.py", line 119, in impl return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/real_ip.py", line 39, in real_ip_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/ban.py", line 73, in ban_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/auth.py", line 127, in auth_middleware return await handler(request) File "/usr/src/homeassistant/homeassistant/components/http/view.py", line 129, in handle result = await result File "/usr/src/homeassistant/homeassistant/components/history/init.py", line 475, in get minimal_response, File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/usr/src/homeassistant/homeassistant/components/history/init.py", line 502, in _sorted_significant_states_json minimal_response, File "/usr/src/homeassistant/homeassistant/components/history/init.py", line 138, in _get_significant_states states = execute(query) File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 75, in execute result = list(qry) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3481, in iter return self._execute_and_instances(context) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3506, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1020, in execute return meth(self, multiparams, params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_clauseelement distilled_params, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1324, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1518, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=excinfo[2], from=e File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_ raise exception File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked [SQL: SELECT states.domain AS states_domain, states.entity_id AS states_entity_id, states.state AS states_state, states.attributes AS states_attributes, states.last_changed AS states_last_changed, states.last_updated AS states_last_updated, states.created AS states_created FROM states WHERE (states.domain IN (?, ?, ?, ?, ?) OR states.last_changed = states.last_updated) AND states.last_updated > ? AND states.entity_id IN (?) AND states.last_updated < ? ORDER BY states.entity_id, states.last_updated] [parameters: ('climate', 'device_tracker', 'humidifier', 'thermostat', 'water_heater', '2020-07-08 17:33:01.245000', 'sensor.load_1m', '2020-07-08 17:34:03.011000')] (Background on this error at: http://sqlalche.me/e/e3q8)

Kane610 commented 4 years ago

Can you enable debug for deconz in hass for more detailed logs,

ASNNetworks commented 4 years ago

Can you enable debug for deconz in hass for more detailed logs,

Allright, enabling debug in configuration now. Will report back once it happens again with logs

digidude commented 4 years ago

I have a couple of hours of logs as well...not sure which parts would be helpful.

Every period of "unavailable" in the log file has a corresponding entry in the log file which like this...

2020-07-08 17:23:44 DEBUG (MainThread) [pydeconz.websocket] Websocket starting
2020-07-08 17:23:44 DEBUG (MainThread) [pydeconz.websocket] Reconnecting to deCONZ in 15.
2020-07-08 17:23:59 DEBUG (MainThread) [pydeconz.websocket] Websocket starting
2020-07-08 17:24:31 DEBUG (MainThread) [pydeconz.websocket] Websocket running

Many...but not all, have the immediate lines AFTER, is a script trigger fired from a zigbee motion sensor

2020-07-08 17:24:32 INFO (MainThread) [homeassistant.components.automation] Executing room-occupancy-trigger
2020-07-08 17:24:32 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Running script
2020-07-08 17:24:32 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Executing step call service
2020-07-08 17:24:32 INFO (MainThread) [homeassistant.components.automation] Executing room-occupancy-trigger
2020-07-08 17:24:32 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Executing step call service
digidude commented 4 years ago

Another clip show more info

2020-07-08 17:26:24 INFO (MainThread) [homeassistant.components.automation] Executing room-unoccupied
2020-07-08 17:26:24 INFO (MainThread) [homeassistant.components.automation] room-unoccupied: Running script
2020-07-08 17:26:24 INFO (MainThread) [homeassistant.components.automation] room-unoccupied: Executing step call service
2020-07-08 17:26:24 INFO (MainThread) [homeassistant.components.automation] Executing Room Unoccupied
2020-07-08 17:26:24 INFO (MainThread) [homeassistant.components.automation] Room Unoccupied: Running script
2020-07-08 17:26:24 INFO (MainThread) [homeassistant.components.automation] Room Unoccupied: Executing step call service
2020-07-08 17:26:24 DEBUG (MainThread) [pydeconz.gateway] Sending "put" "{'on': False}" to "172.30.33.0 /groups/6/action"
2020-07-08 17:26:27 DEBUG (MainThread) [pydeconz.websocket] Websocket starting
2020-07-08 17:26:27 DEBUG (MainThread) [pydeconz.websocket] Reconnecting to deCONZ in 15.
2020-07-08 17:26:33 DEBUG (MainThread) [pydeconz.gateway] HTTP request response: [{'success': {'/groups/6/action/on': False}}]
2020-07-08 17:26:42 DEBUG (MainThread) [pydeconz.websocket] Websocket starting
2020-07-08 17:26:46 DEBUG (MainThread) [pydeconz.websocket] Websocket running
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] Executing room-occupancy-trigger
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Running script
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Executing step call service
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] Executing room-occupancy-trigger
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Executing step call service
2020-07-08 17:26:46 DEBUG (MainThread) [pydeconz.websocket] {"attr":{"lastannounced":null,"lastseen":"2020-07-08T21:26:46Z","manufacturername":"Philips","modelid":"LLC011","name":"bloom-1","swversion":"5.127.1.26581","type":"Color light","uniqueid":"00:17:88:01:00:1d:24:7a-0b"},"e":"changed","id":"3","r":"lights","t":"event","uniqueid":"00:17:88:01:00:1d:24:7a-0b"}
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Executing step call service
2020-07-08 17:26:46 INFO (MainThread) [homeassistant.components.automation] room-occupancy-trigger: Executing step call service
2020-07-08 17:26:46 DEBUG (MainThread) [pydeconz.websocket] {"attr":{"lastannounced":null,"lastseen":"2020-07-08T21:26:46Z","manufacturername":"Philips","modelid":"LCT016","name":"bathrrom-vanity-1","swversion":"1.50.2_r30933","type":"Extended color light","uniqueid":"00:17:88:01:03:ee:53:fc-0b"},"e":"changed","id":"12","r":"lights","t":"event","uniqueid":"00:17:88:01:03:ee:53:fc-0b"}
2020-07-08 17:26:47 DEBUG (MainThread) [pydeconz.websocket] {"attr":{"lastannounced":null,"lastseen":"2020-07-08T21:26:47Z","manufacturername":"Philips","modelid":"LCT016","name":"livingroom-light-table","swversion":"1.50.2_r30933","type":"Extended color light","uniqueid":"00:17:88:01:03:bd:5d:b6-0b"},"e":"changed","id":"4","r":"lights","t":"event","uniqueid":"00:17:88:01:03:bd:5d:b6-0b"}
2020-07-08 17:26:48 DEBUG (MainThread) [pydeconz.websocket] {"attr":{"lastannounced":null,"lastseen":"2020-07-08T21:26:48Z","manufacturername":"Philips","modelid":"LCT012","name":"livingroom-light-candle1","swversion":"1.50.2_r30933","type":"Extended color light","uniqueid":"00:17:88:01:03:ba:15:92-0b"},"e":"changed","id":"5","r":"lights","t":"event","uniqueid":"00:17:88:01:03:ba:15:92-0b"}
...
digidude commented 4 years ago

Let me know if you would like a zip file of the log or anything else that might be helpful

digidude commented 4 years ago

Another odd bit... "connection closed by client."

2020-07-08 17:14:17 DEBUG (MainThread) [pydeconz.websocket] Websocket starting
2020-07-08 17:14:17 DEBUG (MainThread) [pydeconz.websocket] Reconnecting to deCONZ in 15.
2020-07-08 17:14:27 INFO (MainThread) [homeassistant.components.websocket_api.http.connection.2849320784] Connection closed by client
2020-07-08 17:14:32 DEBUG (MainThread) [pydeconz.websocket] Websocket starting
2020-07-08 17:14:32 DEBUG (MainThread) [pydeconz.websocket] Websocket running
2020-07-08 17:14:32 INFO (MainThread) [homeassistant.components.automation] Executing room-occupancy-trigger
Kane610 commented 4 years ago

That connection closed by client is probably a web client you've closed.

Well there is not much to go on. Indeed the connection to Deconz goes down for some reason.

How are you running deconz? What other integrations are you running in parallell? I don't think it's an issue with the integration but something else in your environment.

digidude commented 4 years ago

Not a lot going on here....

image

Was working without trouble untill about two weeks ago. Always available and working from the Phoscon app, when unavailable in HA

I DID have a problem while ago where the Integration app showed TWO identical connections -- both with the same Config Entry Name (long string of numbers.). This was after the OS Update. A reboot, I think, fixed it.

the mqtt server is running on a separate Pi and not in use. I can remove it from HA if helpful.

I would just flush the whole thing and rebuild it ... but I have 47 devices that would need to be reconnected. Not thrilled about that idea.

digidude commented 4 years ago

deconz via an ConBeeII in the usb on the same Pi as HA Accessing the Phoscon app via the integration panel, and linked in the HA left rail Phoscon has a password t access... and had to be activated in the Config settings I can can also VNC into the app on port :5900

automation triggered on a motion detector and dispatched via a service call (currently oversimplified, but functional here.)

- alias: Room Occupancy
  trigger:
    platform: state
    entity_id: 
    - input_boolean.bathroom_occupancy
    - input_boolean.bedroom_occupancy
    - input_boolean.kitchen_occupancy
    to: 'on'
  condition:
  action:
    - service: scene.turn_on
      data_template:
        entity_id: scene.{{ trigger.entity_id.split('.')[1].split('_', 1)[0] | lower }}_daytime_scene
digidude commented 4 years ago

That's actually the wrong code... sorry about that. (and most likely not relevant. Of out 27 dropouts today... 21 of them are immediately followed by motion detector trigger. I know, most likely not related. But oddly frequent.

- alias: room-occupancy-trigger
    initial_state: true
    trigger:
    # the sensor list for room occupancy motion sensors
    - entity_id: 
      - binary_sensor.motion_sensor_6
      - binary_sensor.motion_sensor_7
      - binary_sensor.motion_sensor_8
      platform: state
      to: 'on'
    # Occupied is occupied, it's not conditional
    condition: []
    action:
    - service: timer.start
      data_template:
        entity_id: "timer.{{ trigger.to_state.attributes.friendly_name.split(' ', 1)[0] | lower }}_motion_timer"
    - service: input_boolean.turn_on
      data_template:
        entity_id: "input_boolean.{{ trigger.to_state.attributes.friendly_name.split(' ', 1)[0] | lower }}_occupancy"
ASNNetworks commented 4 years ago

Just got the short disconnect again after 10 hours now. There only one entry in log:

Logger: asyncio Source: components/hassio/ingress.py:154 First occurred: 9:36:59 (36 occurrences) Last logged: 9:37:00

socket.send() raised exception

This happened while I was in HA Server and I noticed everything hanged for that moment as well. Perhaps something is locking my system up, which then re-connects to Deconz?

Also in Deconz at that same moment this happened:

09:36:24:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0101 (end-device might sleep) 09:36:58:636 Remove websocket 172.30.32.1:51808 after error Unknown error 09:36:58:804 New websocket 172.30.32.1:35438 (state: 3) 09:37:00:732 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:37:02:822 ZCL attribute report 0x90FD9FFFFE137215 for cluster: 0x0000, ep: 0x01, frame control: 0x08, mfcode: 0x0000 09:37:05:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0101 (end-device might sleep) 09:37:09:370 Current channel 25 09:37:09:378 Device TTL 897 s flags: 0x7 09:37:10:392 dev /dev/ttyAMA0 09:37:10:395 GW firmware version: 0x264a0700 09:37:10:395 GW firmware version is up to date: 0x264a0700 09:37:10:850 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:37:15:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0101 (end-device might sleep) 09:37:17:763 no button map for: lumi.sensor_magnet ep: 0x01 cl: 0x0000 cmd: 0x0A pl[0]: 002 09:37:17:763 ZCL attribute report 0x00158D0004264F65 for cluster: 0x0000, ep: 0x01, frame control: 0x1C, mfcode: 0x1234 09:37:17:764 0x00158D0004264F65 extract Xiaomi special attribute 0xFF02 09:37:17:764 64 on/off 0 09:37:17:764 01 battery 3035 (0x0BDB) 09:37:17:764 03 unsupported index (data type 0x21) 09:37:17:764 04 unsupported index (data type 0x24) 09:37:17:764 05 unsupported index (data type 0x21) 09:37:17:764 06 unsupported index (data type 0x20) 09:37:20:849 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:37:25:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0006 (end-device might sleep) 09:37:30:849 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:37:35:110 ZCL attribute report 0x000B57FFFEEB6D74 for cluster: 0x0000, ep: 0x01, frame control: 0x08, mfcode: 0x0000 09:37:35:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0006 (end-device might sleep) 09:37:40:849 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:37:45:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0006 (end-device might sleep) 09:37:47:506 ZCL attribute report 0x7CB03EAA00B035A2 for cluster: 0x0300, ep: 0x03, frame control: 0x18, mfcode: 0x0000 09:37:47:516 ZCL attribute report 0x7CB03EAA00B035A2 for cluster: 0x0300, ep: 0x03, frame control: 0x18, mfcode: 0x0000 09:37:50:850 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:37:54:349 remove outdated neighbor 0xB6E6 09:37:55:850 skip binding for attribute reporting of ep: 0x00 cluster 0x0006 (end-device might sleep) 09:38:00:016 ZCL attribute report 0x90FD9FFFFE6FC60E for cluster: 0x0000, ep: 0x01, frame control: 0x08, mfcode: 0x0000 09:38:00:850 skip binding for attribute reporting of ep: 0x00 cluster 0x1000 (end-device might sleep) 09:38:05:849 skip binding for attribute reporting of ep: 0x00 cluster 0x0006 (end-device might sleep) 09:38:09:365 Current channel 25

foxpyre commented 4 years ago

I have the exact same issue so there's something weird going on.

Kane610 commented 4 years ago

So you mean you notice other stuff get locked up as well?

ASNNetworks commented 4 years ago

So you mean you notice other stuff get locked up as well?

Yes, I noticed the entire system was hanging briefly. I couldn't open settings for instance.

Since I've read that SD cards can become corrupt pretty quickly (especially if you write/change a lot with Lovelace/automations) it can go hard. I've already replaced the SD card twice and also noticed those times stability issues at random places. I figured, perhaps the SD card has something to do with it as well. I had a thought that maybe the entire system froze sometimes because the recorder couldn't write it off on time. I also noticed huge CPU spikes lately. So I created a snapshop, used a fresh (never before used, still sealed) sd card with a clean HA install on it. I also re-installed clean HA on the old SD card to test them both. I immediately noticed that the new SD card was flashed WAY faster (or should I say, the old one WAY slower). So I had high hopes the SD card was going bad.

After that I installed Deconz (latest) and did not encounter those weird issues. Then I restored my snapshot, without selecting Deconz. So I ended up with a clean Deconz that did work, on the latest update and my old snapshot back. I had to re-pair all my Zigbee devices (around 36) and had some issues with Zigbee going down when I activated my Zwave.me stick again. But I changed some settings and right now all devices are working with the latest Deconz (though I can't seem to update the Conbee II, but that's a common Deconz issue). So the first issue (losing all connections after even a few seconds with latest Deconz) is now gone. I'll keep my eye on my sensor states to see if they lose connections briefly.

Perhaps my issue was the sd card going bad, resulting recorder going haywire, causing huge cpu spikes, freezing evertything and maybe that caused the sockets to close/disconnect and re-connect. Also CPU load is way more stable now. You can clearly see that it went haywire, then a few hours flat (because of re-installing HA), then it came back but a lot more stable (then it went back flat because I accidentally removed it from recorder integration). chrome_UOdqxuf3VP

I'll update my post tomorrow and I hope all is good now. Not saying that everyone has SD card problems here, but it's worth checking out. In the meantime I changed the recorder integration to limit the writings.

Kane610 commented 4 years ago

If this starts up again. Try disabling the deConz integration to see if that changes anything and if not also remove deconz addon

tleegaard commented 4 years ago

Same issue for me with 0.112.4. Multiple disconnects (status unavailable) with Deconz. I don't believe I had the issue with 0.111. RPI4 running both HA and Raspbee. Screenshot_20200713-133948

ASNNetworks commented 4 years ago

Same issue for me with 0.112.4. Multiple disconnects (status unavailable) with Deconz. I don't believe I had the issue with 0.111. RPI4 running both HA and Raspbee. Screenshot_20200713-133948

Have you tried removing the Deconz integration and adding it again? Keep in mind that you have to change entity id’s if you have manually changed those.

Since I reset everything I haven’t had any issues (over 48 hours now). knocks on wood

I’d advise making a backup first, then removing Deconz integration and re-adding it. If it’s solved, then great! (Still have to manually change entity id’s you edited, but finally fixed) Please share so others can try and maybe that’s the fix. If not: restore your snapshot so you don’t have to manually change entity id’s.

digidude commented 4 years ago

Intermittent unavailability (exactly 15 seconds) on a clean test rig.

To attempt to eliminate or recreate the problem took the following steps:

All Philip Hue bulbs and led strips (all types) All Xiaomi sensors (none configured with the test rig) (motion, multi-sensor, switches) deconz II usb

zero scripts, automation, etc

HA went from hero to zero with the significant other. (Motion triggered lighting has become the new "normal") When it comes to light controls 90% is 10% failure -- that's "apple maps bad."

Not sure what to try next. Or reinstall the hue hub. Oh please no. (Might try z2m on a separate device)

ASNNetworks commented 4 years ago

Intermittent unavailability (exactly 15 seconds) on a clean test rig.

To attempt to eliminate or recreate the problem took the following steps:

  • formatted a brand new SDCard (to eliminate the possibility of bad I/O)
  • used a different Pi (Pi 4 w/4Gb)
  • added an extension cable to the deconz USB stick and moved it 6' away from the pi, power supplies, cables, etc.
  • installed the ZHA and ran for 24 hours -- no dropouts. (A lot of other issues with groups and scenes, for me, with ZHA. I really wanted this to work... open-source python, no proprietary UIs...maybe someday, but then again, I'm not helping so shame on me)
  • Removed the ZHA integrations
  • Added deconz integration, configured and added devices through phoscon
  • one dropout already occurred running overnight (way better than the 1 per hour I was seeing)

All Philip Hue bulbs and led strips (all types) All Xiaomi sensors (none configured with the test rig) (motion, multi-sensor, switches) deconz II usb

zero scripts, automation, etc

HA went from hero to zero with the significant other. (Motion triggered lighting has become the new "normal") When it comes to light controls 90% is 10% failure -- that's "apple maps bad."

Not sure what to try next. Or reinstall the hue hub. Oh please no. (Might try z2m on a separate device)

Are you using Conbee II? Did you also update the firmware of the stick? I updated the stick via Windows since Dezons keeps failing the upgrade (it says succesul but then after a while it show update is available, common issue).

digidude commented 4 years ago

@ASNNetworks Yes...Conbee II No...I don't have a windows box running anyplace. Firmware 264A0700 "Update to 26580700" (button showing, but doesn't upgrade)

ASNNetworks commented 4 years ago

@ASNNetworks Yes...Conbee II No...I don't have a windows box running anyplace. Firmware 264A0700 "Update to 26580700" (button showing, but doesn't upgrade)

Up until now I haven’t had anymore disconnects (last 60 hours). I did upgrade the firmware though, which is the only difference between you and me (I also removed Deconz, used a new sd etc). I’d advise you to upgrade the Conbee II firmware. You can try manually with other platforms than Windows as well: https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/Update-deCONZ-manually

I would try that before you do anything else. Chances are the firmware upgrade fixes it.

CASATAJO commented 4 years ago

I do have exactly same issue, I updated formware manually using windows to 26580700 from 264A0700 but problem still remain, disconnection happends in all my zigbee devices very frequntly affecting sensors and lights

ricard0g0mes commented 4 years ago

I do have exactly same issue, I updated formware manually using windows to 26580700 from 264A0700 but problem still remain, disconnection happends in all my zigbee devices very frequntly affecting sensors and lights

I also still have the same problem and i've also updated my conbeeII via windows Heeellppp....