home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.52k stars 1.47k forks source link

Multiprotocol (Thread) Boot Loop issue #2905

Closed dcmeglio closed 10 months ago

dcmeglio commented 1 year ago

Describe the issue you are experiencing

I had a power failure today and noticed the addon is now bootlooping. Had this before with my SkyConnect and was working with @puddly. Just hoping maybe the new logs provide something useful.

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

Silicon Labs Multiprotocol

What is the version of the add-on?

1.0.0

Steps to reproduce the issue

Start addon after a power failure

System Health information

System Information

version core-2023.3.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.10
os_name Linux
os_version 5.15.84-v8
arch aarch64
timezone America/New_York
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4956 Installed Version | 1.30.1 Stage | running Available Repositories | 1254 Downloaded Repositories | 27 HACS Data | ok
AccuWeather can_reach_server | ok -- | -- remaining_requests | 31
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | June 25, 2023 at 8:00 PM relayer_connected | true remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | false remote_server | us-east-1-4.ui.nabu.casa can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 9.5 -- | -- update_channel | beta supervisor_version | supervisor-2023.03.1 agent_version | 1.4.1 docker_version | 20.10.22 disk_total | 109.3 GB disk_used | 15.9 GB healthy | true supported | true board | rpi4-64 supervisor_api | ok version_api | ok installed_addons | Z-Wave JS UI (1.7.1), File editor (5.5.0), WeatherFlow to MQTT (3.1.6), Mosquitto broker (6.1.3), Emulated HUE (dev) (0.3.0), Samba Backup (5.2.0), ESPHome (2023.2.4), SSH & Web Terminal (13.0.3), Ring-MQTT with Video Streaming (5.1.3), Silicon Labs Multiprotocol (1.0.0), Matter Server (4.1.0)
Dashboards dashboards | 2 -- | -- resources | 11 views | 16 mode | storage
Recorder oldest_recorder_run | February 24, 2023 at 12:49 AM -- | -- current_recorder_run | March 7, 2023 at 8:25 PM estimated_db_size | 747.33 MiB database_engine | sqlite database_version | 3.38.5

Anything in the Supervisor logs that might be useful for us?

No response

Anything in the add-on logs that might be useful for us?

[20:36:07] INFO: The otbr-web is disabled.
s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service mdns successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/check-cpcd-shm.sh
[20:36:07] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Mar  1 2023 17:49:54) starting
Default: mDNS_AddDNSServer: Lock not held! mDNS_busy (0) mDNS_reentrancy (0)
cont-init: info: /etc/cont-init.d/check-cpcd-shm.sh exited 0
cont-init: info: running /etc/cont-init.d/config.sh
[20:36:09] INFO: Generating cpcd configuration.
cont-init: info: /etc/cont-init.d/config.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service banner: starting
-----------------------------------------------------------
 Add-on: Silicon Labs Multiprotocol
 Zigbee and OpenThread multiprotocol add-on
-----------------------------------------------------------
 Add-on version: 1.0.0
 You are running the latest version of this add-on.
 System: Home Assistant OS 9.5  (aarch64 / raspberrypi4-64)
 Home Assistant Core: 2023.3.1
 Home Assistant Supervisor: 2023.03.1
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[20:36:12] INFO: Checking /dev/ttyUSB2 identifying SkyConnect v1.0 from Nabu Casa.
[20:36:12] INFO: Starting universal-silabs-flasher with /dev/ttyUSB2 (baudrate 115200)
2023-03-07 20:36:13 core-silabs-multiprotocol universal_silabs_flasher.flash[239] INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version=<AwesomeVersion SemVer '4.2.1'>, ezsp_version=None, fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>)
2023-03-07 20:36:13 core-silabs-multiprotocol universal_silabs_flasher.flasher[239] INFO Probing ApplicationType.GECKO_BOOTLOADER
2023-03-07 20:36:15 core-silabs-multiprotocol universal_silabs_flasher.flasher[239] INFO Probing ApplicationType.CPC
2023-03-07 20:36:20 core-silabs-multiprotocol universal_silabs_flasher.flasher[239] INFO Probing ApplicationType.EZSP
Error: Failed to probe running application type
2023-03-07 20:36:25 core-silabs-multiprotocol concurrent.futures[239] ERROR exception calling callback for <Future at 0x7fa2f46550 state=finished returned NoneType>
Traceback (most recent call last):
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 329, in _invoke_callbacks
    callback(self)
  File "/usr/lib/python3.9/asyncio/futures.py", line 398, in _call_set_state
    dest_loop.call_soon_threadsafe(_set_state, destination, source)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 791, in call_soon_threadsafe
    self._check_closed()
  File "/usr/lib/python3.9/asyncio/base_events.py", line 510, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
s6-rc: warning: unable to start service universal-silabs-flasher: command exited 1
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service mdns: stopping
Default: mDNSResponder (Engineering Build) (Mar  1 2023 17:49:54) stopping
s6-rc: info: service banner: stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[20:36:25] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service mdns successfully stopped

Additional information

No response

puddly commented 1 year ago

Does the issue resolve after you physically unplug the SkyConnect and plug it back in?

dcmeglio commented 1 year ago

Power down first or just yank it?

puddly commented 1 year ago

No need to power down, just unplug it and plug it back in.

dcmeglio commented 1 year ago

Yup. Works after that

wienans commented 1 year ago

I had the same issue, and reconnecting the SkyConnect worked but power off and power on does not.

dcmeglio commented 1 year ago

Happened again. Seems to be related to having a power flash. I've had this happen 4 times, 3 times I can 100% confirm were during a power flash.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dcmeglio commented 1 year ago

I don’t think this should be closed. I still continue to have this issue any time there is a power flash.

agners commented 1 year ago

By power flash you refer to a power outage?

Does a regular reboot reproduces the problem too? (maybe try 3-4 reboots, as it doesn't happen always it seems?)

agners commented 1 year ago

Also, are you sure that there is no other integration or add-on trying to accessing the same UART?

dcmeglio commented 1 year ago

@agners sorry I missed your questions. Yes I mean a power outage (happened again yesterday). I have not reproduced it with a regular reboot. It happens 100% of the time though during a power outage or if I unplug the power cable. No, no other addons using the same UART.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dcmeglio commented 1 year ago

This is not stale it’s just not receiving any attention. The issue is consistently reproducible

puddly commented 1 year ago

Can you provide some more info about what you mean by "if I unplug the power cable"? What exact hardware are you using (hardware/SBC, USB hubs, power supply, etc.) and what steps do you follow to trigger this outside of a power flash event?

dcmeglio commented 1 year ago

Sure. If I unplug the power cable to my rpi4 I can reproduce it consistently. The solution is then to unplug the skyconnect, plug it back in, then restart the multiprotocol addon.

Hardware: rpi4 Model B 8GB using the supplied power cable SkyConnect is connected to a powered USB hub, https://www.amazon.com/gp/product/B00SCX6I8A/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

Let me know if you need more details.

ekobres commented 1 year ago

I agree - I also have a powered hub. My theory is that the Skyconnect doesn’t handle losing connection to the Pi while still powered. With a short power glitch, the Pi may reboot and the USB hub stays powered for a few seconds. Whatever the case if the Pi loses power and the Skyconnect doesn’t - multiprotocol firmware hangs until you power cycle the Skyconnect.The thread firmware doesn’t seem to have this problem.On Jul 15, 2023, at 12:33 PM, dcmeglio @.***> wrote: Sure. If I unplug the power cable to my rpi4 I can reproduce it consistently. The solution is then to unplug the skyconnect, plug it back in, then restart the multiprotocol addon. Hardware: rpi4 Model B 8GB using the supplied power cable SkyConnect is connected to a powered USB hub, https://www.amazon.com/gp/product/B00SCX6I8A/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 Let me know if you need more details.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

agners commented 1 year ago

@dcmeglio is this still reproducible with the latest add-on and firmware?

I've previously tested with a powered hub, and rebooted my host, but I wasn't able to reproduce. However, I did not use the same USB Hub/Raspberry Pi combination.

ekobres commented 1 year ago

Happened to me just yesterday.On Aug 16, 2023, at 5:16 PM, Stefan Agner @.***> wrote: @dcmeglio is this still reproducible with the latest add-on and firmware? I've previously tested with a powered hub, and rebooted my host, but I wasn't able to reproduce. However, I did not use the same USB Hub/Raspberry Pi combination.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

puddly commented 1 year ago

I'll pick up that specific hub and see if I can reproduce your issue.

There were some recent bugfixes in the multiprotocol firmware that related to crashes so if you haven't upgraded to the latest version of the addon, try it out.

puddly commented 1 year ago

@ekobres I'm testing with:

The SkyConnect is plugged into a USB extension cable and into one of the hub's ports, and the hub's USB 3.0 connector is plugged into a blue USB port on the Pi. If I unplug the Pi's USB-C connection for a second and plug it back in to simulate power loss, everything reboots after a minute or two. Multi-PAN continues to work without issues.

If you have a reliable way to reproduce this problem with your setup, let me know! I'd like to figure out the root cause and fix this problem.

ekobres commented 1 year ago

@puddly That’s not my hub or procedure to re-create - you may have me confused with another user. I switched to the thread-only firmware and have only had to reset the SkyConnect or OTBR once in 60 days versus almost daily.On Aug 28, 2023, at 4:38 PM, puddly @.***> wrote: @ekobres I'm testing with:

The official Raspberry Pi power supply 8GB Raspberry Pi 4 The hub you linked to: https://www.amazon.com/gp/product/B00SCX6I8A

The SkyConnect is plugged into a USB extension cable and into one of the hub's ports, and the hub's USB 3.0 connector is plugged into a blue USB port on the Pi. If I unplug the Pi's USB-C connection for a second and plug it back in to simulate power loss, everything reboots after a minute or two. Multi-PAN continues to work without issues. If you have a reliable way to reproduce this problem with your setup, let me know! I'd like to figure out the root cause and fix this problem.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

dcmeglio commented 1 year ago

@puddly I am the one with that hub. I haven't had an issue in a few weeks even after a couple power flashes. I'll keep an eye on it.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dcmeglio commented 11 months ago

Still experiencing this.

puddly commented 11 months ago

I'm trying to reproduce this issue and provide enough information to Silicon Labs to fix it.

Can you describe what specific Thread devices you have on your network? I suspect that the crash is happening on the Thread side of things, as I have been completely unable to reproduce it with just Zigbee.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.