Open ropg opened 1 week ago
Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
zha documentation zha source (message by IssueLinks)
Thanks. Currently tracking that issue here:
I think this may be happening during disconnect because HA tasks now eagerly execute, so create_task()
with synchronous operations will actually run instantly. This will require some changes in zigpy. We don't expect event listeners to modify the list of listeners like this.
Naturally this needs to be fixed, but as a workaround, the following automation reboots my instance if a particular Zigbee light becomes unavailable for 3 minutes.
alias: Reboot when zigbee goes away
description: ""
trigger:
- platform: state
entity_id:
- light.n1_light
to: unavailable
for:
hours: 0
minutes: 3
seconds: 0
condition: []
action:
- service: hassio.host_reboot
data: {}
mode: single
(I see it executing about 1-2 times a day at unpredictable and seemingly random times when nothing particular is happening.)
The error you're seeing shouldn't cause any noticeable issues. You're likely only seeing that because your dongle is disconnecting or crashing.
Whatever happens, I need a host reboot for it to clear, not just a restart of HA. So well possible my dongle crashes, indeed. It's a Nabu Casa SkyConnect:
DEVLINKS: >-
/dev/serial/by-id/usb-Nabu_Casa_SkyConnect_v1.0_680ca9e8b496ed11aad0c998a7669f5d-if00-port0
/dev/serial/by-path/platform-fd500000.pcie-pci-0000:01:00.0-usb-0:1.3:1.0-port0
/dev/serial/by-path/platform-fd500000.pcie-pci-0000:01:00.0-usbv2-0:1.3:1.0-port0
DEVNAME: /dev/ttyUSB0
DEVPATH: >-
/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.3/1-1.3:1.0/ttyUSB0/tty/ttyUSB0
ID_BUS: usb
ID_MODEL: SkyConnect_v1.0
ID_MODEL_ENC: SkyConnect\x20v1.0
ID_MODEL_ID: ea60
ID_PATH: platform-fd500000.pcie-pci-0000:01:00.0-usb-0:1.3:1.0
ID_PATH_TAG: platform-fd500000_pcie-pci-0000_01_00_0-usb-0_1_3_1_0
ID_PATH_WITH_USB_REVISION: platform-fd500000.pcie-pci-0000:01:00.0-usbv2-0:1.3:1.0
ID_REVISION: '0100'
ID_SERIAL: Nabu_Casa_SkyConnect_v1.0_680ca9e8b496ed11aad0c998a7669f5d
ID_SERIAL_SHORT: 680ca9e8b496ed11aad0c998a7669f5d
ID_TYPE: generic
ID_USB_DRIVER: cp210x
ID_USB_INTERFACES: ':ff0000:'
ID_USB_INTERFACE_NUM: '00'
ID_USB_MODEL: SkyConnect_v1.0
ID_USB_MODEL_ENC: SkyConnect\x20v1.0
ID_USB_MODEL_ID: ea60
ID_USB_REVISION: '0100'
ID_USB_SERIAL: Nabu_Casa_SkyConnect_v1.0_680ca9e8b496ed11aad0c998a7669f5d
ID_USB_SERIAL_SHORT: 680ca9e8b496ed11aad0c998a7669f5d
ID_USB_TYPE: generic
ID_USB_VENDOR: Nabu_Casa
ID_USB_VENDOR_ENC: Nabu\x20Casa
ID_USB_VENDOR_ID: 10c4
ID_VENDOR: Nabu_Casa
ID_VENDOR_ENC: Nabu\x20Casa
ID_VENDOR_ID: 10c4
MAJOR: '188'
MINOR: '0'
SUBSYSTEM: tty
TAGS: ':systemd:'
USEC_INITIALIZED: '3329682'
Are you running it with normal ZigBee firmware or with multiprotocol?
Multiprotocol. Have no thread devices though, so could turn that off.
Normal ZigBee firmware doesn't crash like this so you'd benefit from migrating off of multi-PAN: https://skyconnect.home-assistant.io/procedures/disable-multiprotocol/
Turned it off, will let you know if this resolves my issue. Still curious why this started happening out of the blue ~ a week ago.
Just to add to this, I'm also having the identical issue reported here (also with a Skyconnect) but only flashed for Zigbee duties (not MultiPAN). This has been happening for me for several weeks and is only cleared with a server reboot (HA restart doesn't fix it, VM reboot doesn't fix it, needs a complete server reboot that HA is hosted on).
Sometimes happens 3-4 times a day, sometimes 3-4 times a week. I'll raise a new issue with other logs but it does appear to occur with just the normal Zigbee firmware as well.
HA restart doesn't fix it, VM reboot doesn't fix it, needs a complete server reboot that HA is hosted on
This is really odd. For anything other than multi-PAN, any state is completely reset when you reload just the integration, let alone anything else. Requiring a physical reset of the hardware means that the stick needs to be reset as well. This isn't a known issue with anything other than multi-PAN.
Can you record a debug log of this happening? Could you also include in the log what happens when you reload the integration, when the stick is in this state?
Can you record a debug log of this happening?
Zigbee crashed again about 20 minutes ago. Initiated a host restart and now HA is no longer accessible (sigh). Here's the same batch of logs from a few days ago:
First log:
Logger: bellows.uart
Source: /usr/local/lib/python3.12/site-packages/bellows/uart.py:252
First occurred: 01:40:53 (1 occurrences)
Last logged: 01:40:53
Lost serial connection: ConnectionResetError('Failed to transmit ASH frame after 4 retries')
Second log:
Logger: bellows.ezsp
Source: /usr/local/lib/python3.12/site-packages/bellows/ezsp/__init__.py:496
First occurred: 01:40:53 (1 occurrences)
Last logged: 01:40:53
Exception running handler
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/bellows/ezsp/__init__.py", line 494, in handle_callback
handler(*args)
File "/usr/local/lib/python312/site-packages/bellows/zigbee/application.py", line 626, in ezsp_callback_handler
self.connection_lost(args[0])
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 688, in connection_lost
self.listener_event("connection_lost", exc)
File "/usr/local/lib/python3.12/site-packages/zigpy/util.py", line 50, in listener_event
for listener, include_context in self._listeners.values():
RuntimeError: dictionary changed size during iteration
Third log:
Logger: homeassistant.util.executor
Source: util/executor.py:30
First occurred: 01:40:54 (2 occurrences)
Last logged: 01:40:55
Thread[SyncWorker_0] is still running at shutdown: File "/usr/local/lib/python3.12/threading.py", line 1030, in _bootstrap self._bootstrap_inner() File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/local/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker work_item.run() File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 673, in flush termios.tcdrain(self.fd)
I'm not near the server at the moment, but will try and get more logs when able. That said, the above sequence of logs occurs every time Zigbee crashes.
So far, my problem seems to have disappeared after flashing my SkyConnect back to Zigbee-only. Not sure whether that means I should close this as it does have others that have remaining issues on the stick with Zigbee-only in the conversation. I'll leave it open for now, but feel free to close.
The problem
My Home Assistant (Core 2024.5.1, Supervisor 2024.04.4, Operating System 12.2, Frontend 20240501.0) reports:
about once a day now, leading to a loss of all my zigbee lights and switches until I reboot, while causing (or being caused by, IDK) my Zigbee Home Automation integration to report "failed setup, will retry". (It periodically retries and fails, only reboot helps.)
What version of Home Assistant Core has the issue?
2024.5.1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
maybe ZHA, but error shows up in Core logs
Link to integration documentation on our website
No response
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response