home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.18k stars 31.14k forks source link

ZHA fails to initializing after ZHA crashes #108283

Closed freemann closed 6 months ago

freemann commented 10 months ago

The problem

Randomly ZHA crashes. After the crash it fails to initialize and only a full reboot fixes the issue for X hours. Here HA itself doesn't crash and keeps running normal I enabled debug log but the logfile is 87MB and can't add it.

Ha version: Core 2024.1.3 Supervisor 2023.12.1 Operating System 11.4

What version of Home Assistant Core has the issue?

2024.1.3

What was the last working version of Home Assistant Core?

2023.1.2

What type of installation are you running?

Home Assistant Supervised

Integration causing the issue

ZHA

Link to integration documentation on our website

No response

Diagnostics information

config_entry-zha-d74ff06558e3c8e562710d33dd0e67f2.json

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 10 months ago

Hey there @dmulcahey, @adminiuga, @puddly, @thejulianjes, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `zha` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign zha` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

fredd589 commented 10 months ago

The problem After some hours zha stops working and fails to initialise. After restarting HA it works again for some hours.

What version of Home Assistant Core has the issue? Core 2024.1.3 Supervisor 2023.12.1 Operating System 11.4 Frontend 20240104.0

What was the last working version of Home Assistant Core? core-2023.12.1

What type of installation are you running? HA OS on Pi4

Integration causing the issue ZHA, using HA Sky Connect Stick

Debug logs; config_entry-zha-3730e59617a2f4a9c117811e431d639a.json.txt home-assistant_zha_2024-01-18T09-11-29.985Z.log

fredd589 commented 10 months ago

This just happened again and the log showed the below;

2024-01-18 13:51:14.482 ERROR (MainThread) [bellows.uart] Lost serial connection: ConnectionResetError('Remote server closed connection')

And also

2024-01-18 13:51:14.485 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

jaccolo commented 10 months ago

Same issue for me. Recently I upgraded from HA Core 2024.1.2 to 2024.1.3. Since then every few hours the Zigbee integration fails. Reloading doesn't help. I have to restart HA Core or reboot the VM (HA OS on a Proxmox virtual machine, x86_64). I downgraded to HA Core 2024.1.2 and the issue is gone.

puddly commented 10 months ago

@jaccolo There was not a single ZHA code or dependency change between Home Assistant Core 2024.1.2 and 2024.1.3 so that is very surprising. If you can, please generate debug logs with both versions so that they can be compared.

jaccolo commented 10 months ago

@jaccolo There was not a single ZHA code or dependency change between Home Assistant Core 2024.1.2 and 2024.1.3 so that is very surprising. If you can, please generate debug logs with both versions so that they can be compared.

I will do that this weekend (it's not very convenient when HA controlled lights aren't working when it's dark, and when all lights in the home are HA controlled :-) )

fredd589 commented 10 months ago

@jaccolo I've just downgraded to 2024.1.2 and run debug logs,- see below for both versions of core.

debug log - 2024.1.2.log debug log - 2024.1.3.log

puddly commented 10 months ago

@fredd589 You're using the multiprotocol addon. When you downgrade to 2024.1.2, are you also downgrading the Multiprotocol addon to a lower version along with Core?

jaccolo commented 10 months ago

ZHA crashed again tonight, with Core 2024.1.2. So downgrading does not solve it.

syssi commented 10 months ago

Downgrading to 2023.12.4 did help here (virtualenv + zha + zb-gw03 (EFR32MG21 with ncp-uart firmware).

freemann commented 10 months ago

I downgraded to 2023.12.3 (because I now that 2023.12.4 also had this issue, when thinking back). We were on vacation from 28-12-2023 and HA was on that moment on the most recent version. On vacation I noticed that ZHA was crashed and the Light Ghost didn't work because of the crash.

Now on 2023.12.3 ZHA is complety ** it's not starting at all. Maybe that's because of the SkyConnect updates which are not backwards compatible??

Attached 2 logfiles for; Core 2023.12.3 Supervisor 2023.12.1 Operating System 11.4 Frontend 20231208.2

home-assistant_zha_2024-01-19T10-20-32.381Z.log home-assistant_zha_2024-01-19T10-34-09.315Z.log

freemann commented 10 months ago

2023.12.4 also not working... From boot the ZHA integration fails to setup; "Failed setup, will retry"

Edit: Upgraded to 2024.1.1, ZHA is up and running... for how long...

freemann commented 10 months ago

2024.1.1 crashed. At 21:24:55 it worked, now it's crashed. Tried to restart... it's not working..

Logfile won't upload... will try tomorrow

updating to 2024.1.2...

fredd589 commented 10 months ago

@fredd589 You're using the multiprotocol addon. When you downgrade to 2024.1.2, are you also downgrading the Multiprotocol addon to a lower version along with Core?

I did not downgrade the multi protocol, I have also not had a crash since downgrading core.

dmulcahey commented 10 months ago

Try todays release please

freemann commented 10 months ago

Upgrading to 2024.1.4 Thanks for your time and effort!

freemann commented 10 months ago

2024.1.4 is still going strong

SaltyBart commented 10 months ago

Update 2024.1.4 did not solve the problem, after a few hours it loses all devices again unfortunately.

01CaSu commented 10 months ago

Hello, same issues here. After update, the same error again :( need to restart HA to work again....for few hours...

freemann commented 10 months ago

No crash here after ~7 hours

freemann commented 10 months ago

error_log-5.txt It failed...

[edit 15:30] downgrade to 2024.1.2

SaltyBart commented 10 months ago

Only a compleet reboot will fix this temporally, until it fails again.

Wagner545 commented 10 months ago

I have experience this all week.

But then today, I tried to disable the ZHA integration and Enable it again a few minutes later. Now ZHA works and have been stable for the passed 5 hours. However now my Ikea Integration is failing to start... Not sure if there is an underlying Home Assistant problem which is causing the integrations not to start correctly.

freemann commented 10 months ago

I just disabled the ZHA integration waited a few minutes and enabled it. The strange thing was that first it gave a failed initialized and the second time it started ok. Lets see whats going to happen

syssi commented 10 months ago

2024.1.4 feels promising. No crash since ~8 hours.

freemann commented 10 months ago

Here 2024.1.4 crashes after 7,5 hours. But the crashes seems to occur randomly.... Sometimes a few hours, then a day.

puddly commented 10 months ago

@freemann It looks like you are using the Multiprotocol addon. This isn't an issue with ZHA: https://github.com/home-assistant/addons/issues/3408

jaccolo commented 10 months ago

Updated this morning to latest Core (2024.1.4). Worked fine for about 10 hours. Now crashed again. Added debug logging of ZHA to this message (gzipped, because it is quite large). home-assistant_zha_2024-01-20T18-40-23.118Z.log.gz

System:

➜  ~ dmesg  | grep -i 'usb 2-1'
[    1.197162] usb 2-1: new full-speed USB device number 2 using xhci_hcd
[    1.332412] usb 2-1: New USB device found, idVendor=10c4, idProduct=ea60, bcdDevice= 1.00
[    1.333495] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    1.334422] usb 2-1: Product: SkyConnect v1.0
[    1.335126] usb 2-1: Manufacturer: Nabu Casa
[    1.335818] usb 2-1: SerialNumber: 581e2249c596ed11820ec698a7669f5d
[    2.624959] usb 2-1: cp210x converter now attached to ttyUSB0
➜  ~ ha core info   
arch: amd64
audio_input: null
audio_output: null
backups_exclude_database: false
boot: true
image: ghcr.io/home-assistant/qemux86-64-homeassistant
ip_address: 172.30.32.1
machine: qemux86-64
port: 8123
ssl: false
update_available: false
version: 2024.1.4
version_latest: 2024.1.4
watchdog: true

What I think might be relevant is this (from the logging):

2024-01-20 19:06:57.276 DEBUG (MainThread) [bellows.uart] Connection lost: ConnectionResetError('Remote server closed connection')
2024-01-20 19:06:57.277 ERROR (MainThread) [bellows.uart] Lost serial connection: ConnectionResetError('Remote server closed connection')
2024-01-20 19:06:57.277 DEBUG (MainThread) [bellows.ezsp] socket://core-silabs-multiprotocol:9999 connection lost unexpectedly: Remote server closed connection
2024-01-20 19:06:57.277 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-20 19:06:57.278 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with ("Serial connection loss: ConnectionResetError('Remote server closed connection')",)
2024-01-20 19:06:57.278 DEBUG (MainThread) [zigpy.application] Connection to the radio has been lost: "Serial connection loss: ConnectionResetError('Remote server closed connection')"
2024-01-20 19:06:57.278 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Connection to the radio was lost: "Serial connection loss: ConnectionResetError('Remote server closed connection')"
2024-01-20 19:06:57.278 DEBUG (MainThread) [bellows.uart] Connection lost: None
2024-01-20 19:06:57.278 DEBUG (MainThread) [bellows.uart] Closed serial connection
2024-01-20 19:06:57.279 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Shutting down ZHA ControllerApplication
puddly commented 10 months ago

@jaccolo As mentioned above, it looks like you are using the Multiprotocol addon. This isn't an issue with ZHA, nor will this be fixed by a Core update: https://github.com/home-assistant/addons/issues/3408

jaccolo commented 10 months ago

@jaccolo As mentioned above, it looks like you are using the Multiprotocol addon. This isn't an issue with ZHA, nor will this be fixed by a Core update: home-assistant/addons#3408

@puddly The logging seems to tell me that ZHA loses the connection with the Skyconnect USB device (UART = serial). When the connection is lost, is seems logical that Multiprotocol isn't working too, because that uses the same device. So I don't think it's multiprotocol related, but USB/serial/device related. Is everyone experiencing this issue, using the Skyconnect USB device?

puddly commented 10 months ago

The logging seems to tell me that ZHA loses the connection with the Skyconnect USB device

ZHA does not communicate with the SkyConnect at all when you're using the multiprotocol addon. The multiprotocol addon communicates with the SkyConnect and then exposes a TCP server for ZHA to communicate with. The "serial" protocol is used with the raw TCP socket so the "UART" disconnect is the multiprotocol addon's TCP server shutting down and resetting.

jaccolo commented 10 months ago

@puddly Thanks for explaining. Unfortunately the logging of "Silicon Labs Multiprotocol" add-on is gone after restarting Core. I will wait for the next crash and check that logging. Perhaps it isn't related at all to ZHA, and is the Multiprotocol addon causing issues.

SaltyBart commented 10 months ago

I'm going now to disable the Multiprotocol, lets see!

freemann commented 10 months ago

I'm going now to disable the Multiprotocol, lets see!

Me to, lets see!

done!

Success! Options successfully saved.

For others, here's a step by step howto disable MultiProtocol; https://skyconnect.home-assistant.io/procedures/disable-multiprotocol/

juleztb commented 10 months ago

Same problem for me and yes, I'm also using the SkyConnect. ZHA crashes every few hours with a zigpy error.

dmulcahey commented 10 months ago

Same problem for me and yes, I'm also using the SkyConnect. ZHA crashes every few hours with a zigpy error.

Are you using multi protocol?

juleztb commented 10 months ago

Yes I did. Just deactivated it after I read @freemann s post. As I don't use any thread devices at the moment it's no problem to do so. For the future that won't be a solution, though. I also can't say if this fixes the problem yet, as I just deactivated it like 20 minutes ago, after posting.

dmulcahey commented 10 months ago

Is no one reading the giant warning when enabling multiprotocol? You legit have to agree to to potentially break things to even enable it…

freemann commented 10 months ago

No, sorry. Even dont know how and when I turned it on...

OK, it could break thing... But you wont expect it hangs HA and kills ZHA by enabling this. But I dont need so disabled it.

Thanks!

SaltyBart commented 10 months ago

Is no one reading the giant warning when enabling multiprotocol? You legit have to agree to to potentially break things to even enable it…

But it worked fine, just since one week its causes issues. If you never test, you will never know :P

juleztb commented 10 months ago

Same as the others. If there was a giant warning, I don't remember it. I own the SkyConnect since it's original release, tested it and it worked until a few days ago. As the log mentions zha and zigpy errors, I wouldn't have thought about the possibility of multiprotocol being the real error.

syssi commented 10 months ago

2024.1.4 crashed here too after half a day. I'm back on 2023.12.4. I don't use a multiprotocol setup but my zigbee modem is attached using a network connection.

tdalbo92 commented 10 months ago

I'm having the same issue as well. Restarting HA fixes it, but it inevitably starts dying within a few hours.

Does anybody have a known stable version to revert to? Or is this seemingly a SkyConnect internal issue?

freemann commented 10 months ago

@tdalbo92 Please check if you are using multi protocol and if so, disable it; https://skyconnect.home-assistant.io/procedures/disable-multiprotocol/

freemann commented 10 months ago

After disabling MultiProtocol and updating to 2024.1.5 everything looks stable again.

tdalbo92 commented 10 months ago

I can attempt this for debugging, but this disables Thread, correct?

freemann commented 10 months ago

I can attempt this for debugging, but this disables Thread, correct?

Yes, you should only disable of you don't use Thread. 2024.1.5 should have some fixes and there was a add-on update for OpenThread Border Router. Also see; https://github.com/home-assistant/addons/issues/3408

tdalbo92 commented 10 months ago

I'll migrate my Thread devices to my Amazon Echo for the time being, and follow the progress in that thread for updates.

Thank you!

SaltyBart commented 10 months ago

At my side everything is also stable now, thanks @dmulcahey!

jaccolo commented 10 months ago

Disabled multiprotocol, and running stable for 20 hours now