home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.56k stars 1.51k forks source link

openthread border router add-on consuming 100% CPU. #3490

Closed kluner closed 8 months ago

kluner commented 8 months ago

Describe the issue you are experiencing

Since the last update of HAOS and Homeassistant to below versions, the open thread border router add on has started using 100% CPU. Turning overal VM cpu usage from 3% to 27%. (enough to trigger the fans).

-----------------------------------------------------------
 Add-on: OpenThread Border Router
 OpenThread Border Router add-on
-----------------------------------------------------------
 Add-on version: 2.4.7
 You are running the latest version of this add-on.
 System: Home Assistant OS 12.0  (amd64 / qemux86-64)
 Home Assistant Core: 2024.2.4
 Home Assistant Supervisor: 2024.02.0

logs show no remarkable information:

s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[09:11:58] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[09:11:58] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-28 09:11:59.070 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-28 09:11:59.070 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-28 09:12:01.073 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.SPINEL at 460800 baud
2024-02-28 09:12:05.381 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

CEC Scanner

What is the version of the add-on?

2.4.7

Steps to reproduce the issue

  1. start or reboot the whole instance
  2. cpu usage for add-on goes to 100% ...

System Health information

System Information

version core-2024.2.4
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.1
os_name Linux
os_version 6.6.16-haos
arch x86_64
timezone Europe/Amsterdam
config_dir /config
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 9 January 2025 at 01:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | true remote_server | eu-central-1-13.ui.nabu.casa certificate_status | ready instance_id | 3b1746964c94474db743e9ce4d6d388e can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.0 -- | -- update_channel | stable supervisor_version | supervisor-2024.02.0 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 30.8 GB disk_used | 8.4 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.9.0), File editor (5.8.0), Glances (0.21.0), Silicon Labs Multiprotocol (2.4.4), Matter Server (5.2.0), OpenThread Border Router (2.4.7), Node-RED (17.0.7)
Dashboards dashboards | 5 -- | -- resources | 0 views | 4 mode | storage
Recorder oldest_recorder_run | 22 February 2024 at 10:39 -- | -- current_recorder_run | 28 February 2024 at 09:11 estimated_db_size | 1329.48 MiB database_engine | sqlite database_version | 3.44.2

Anything in the Supervisor logs that might be useful for us?

24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/amd64-hassio-multicast with version 2023.06.2
24-02-28 09:11:57 INFO (MainThread) [supervisor.plugins.multicast] Starting Multicast plugin
24-02-28 09:11:57 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning hassio_multicast application
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.multicast] Starting Multicast ghcr.io/home-assistant/amd64-hassio-multicast with version 2023.06.2 - Host
24-02-28 09:11:57 INFO (MainThread) [supervisor.homeassistant.secrets] Loaded 1 Home Assistant secrets
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to ghcr.io/home-assistant/qemux86-64-homeassistant with version 2024.2.4
24-02-28 09:11:57 WARNING (MainThread) [supervisor.homeassistant.core] Watchdog found Home Assistant failed, restarting...
24-02-28 09:11:57 INFO (MainThread) [supervisor.os.manager] Detect Home Assistant Operating System 12.0 / BootSlot B
24-02-28 09:11:57 INFO (SyncWorker_0) [supervisor.docker.manager] Starting homeassistant
24-02-28 09:11:57 INFO (MainThread) [supervisor.homeassistant.core] Wait until Home Assistant is ready
24-02-28 09:11:57 INFO (MainThread) [supervisor.store.git] Loading add-on /data/addons/git/a0d7b954 repository
24-02-28 09:11:57 INFO (MainThread) [supervisor.store.git] Loading add-on /data/addons/core repository
24-02-28 09:11:57 INFO (MainThread) [supervisor.store.git] Loading add-on /data/addons/git/5c53de3b repository
24-02-28 09:11:57 INFO (MainThread) [supervisor.store] Loading add-ons from store: 77 all - 77 new - 0 remove
24-02-28 09:11:57 INFO (MainThread) [supervisor.addons.manager] Found 7 installed add-ons
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to ghcr.io/hassio-addons/glances/amd64 with version 0.21.0
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-ssh with version 9.9.0
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-matter-server with version 5.2.0
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-otbr with version 2.4.7
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-configurator with version 5.8.0
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to homeassistant/amd64-addon-silabs-multiprotocol with version 2.4.4
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.interface] Attaching to ghcr.io/hassio-addons/node-red/amd64 with version 17.0.7
24-02-28 09:11:57 INFO (MainThread) [supervisor.backups.manager] Found 1 backup files
24-02-28 09:11:57 INFO (MainThread) [supervisor.discovery] Loaded 2 messages
24-02-28 09:11:57 INFO (MainThread) [supervisor.ingress] Loaded 0 ingress sessions
24-02-28 09:11:57 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state setup
24-02-28 09:11:57 INFO (MainThread) [supervisor.resolution.check] System checks complete
24-02-28 09:11:57 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state setup
24-02-28 09:11:57 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
24-02-28 09:11:57 INFO (MainThread) [supervisor.jobs] 'ResolutionFixup.run_autofix' blocked from execution, system is not running - setup
24-02-28 09:11:57 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state setup
24-02-28 09:11:57 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
24-02-28 09:11:57 INFO (MainThread) [__main__] Running Supervisor
24-02-28 09:11:57 INFO (MainThread) [supervisor.os.manager] Rauc: B - marked slot kernel.1 as good
24-02-28 09:11:57 INFO (MainThread) [supervisor.addons.manager] Phase 'initialize' starting 0 add-ons
24-02-28 09:11:57 INFO (MainThread) [supervisor.addons.manager] Phase 'system' starting 0 add-ons
24-02-28 09:11:57 INFO (MainThread) [supervisor.addons.manager] Phase 'services' starting 5 add-ons
24-02-28 09:11:57 INFO (SyncWorker_6) [supervisor.docker.manager] Cleaning addon_core_ssh application
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-ssh with version 9.9.0
24-02-28 09:11:57 WARNING (MainThread) [supervisor.docker.addon] Glances running with disabled protected mode!
24-02-28 09:11:57 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_a0d7b954_glances application
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on ghcr.io/hassio-addons/glances/amd64 with version 0.21.0
24-02-28 09:11:57 INFO (SyncWorker_1) [supervisor.docker.manager] Cleaning addon_core_silabs_multiprotocol application
24-02-28 09:11:57 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-silabs-multiprotocol with version 2.4.4
24-02-28 09:11:57 INFO (SyncWorker_3) [supervisor.docker.manager] Cleaning addon_core_matter_server application
24-02-28 09:11:58 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-matter-server with version 5.2.0
24-02-28 09:11:58 INFO (SyncWorker_3) [supervisor.docker.manager] Cleaning addon_core_openthread_border_router application
24-02-28 09:11:58 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-otbr with version 2.4.7
24-02-28 09:11:58 INFO (MainThread) [supervisor.api.middleware.security] /network/info access from core_matter_server
24-02-28 09:11:59 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state startup
24-02-28 09:11:59 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
24-02-28 09:12:01 INFO (MainThread) [supervisor.api.middleware.security] /network/info access from core_silabs_multiprotocol
24-02-28 09:12:02 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
24-02-28 09:12:02 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to NOT_RUNNING
24-02-28 09:12:04 WARNING (MainThread) [supervisor.addons.addon] core_silabs_multiprotocol is already running!
24-02-28 09:12:07 INFO (MainThread) [supervisor.core] Skipping start of Home Assistant
24-02-28 09:12:07 INFO (MainThread) [supervisor.addons.manager] Phase 'application' starting 2 add-ons
24-02-28 09:12:07 INFO (SyncWorker_5) [supervisor.docker.manager] Cleaning addon_core_configurator application
24-02-28 09:12:08 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-configurator with version 5.8.0
24-02-28 09:12:08 INFO (SyncWorker_4) [supervisor.docker.manager] Cleaning addon_a0d7b954_nodered application
24-02-28 09:12:08 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on ghcr.io/hassio-addons/node-red/amd64 with version 17.0.7
24-02-28 09:12:12 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to RUNNING
24-02-28 09:12:12 INFO (MainThread) [supervisor.homeassistant.core] Detect a running Home Assistant instance
24-02-28 09:12:38 INFO (MainThread) [supervisor.misc.tasks] All core tasks are scheduled
24-02-28 09:12:38 INFO (MainThread) [supervisor.core] Supervisor is up and running
24-02-28 09:12:38 INFO (MainThread) [supervisor.host.info] Updating local host information
24-02-28 09:12:38 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state running
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for free_space/system
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for trust/supervisor
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for multiple_data_disks/system
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for security/core
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_failed/dns_server
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_ipv6_error/dns_server
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for docker_config/system
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for pwned/addon
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for ipv4_connection_problem/system
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.checks.base] Run check for no_current_backup/system
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.check] System checks complete
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state running
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state running
24-02-28 09:12:38 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
24-02-28 09:12:38 INFO (MainThread) [supervisor.host.services] Updating service information
24-02-28 09:12:38 INFO (MainThread) [supervisor.host.network] Updating local network information
24-02-28 09:12:38 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
24-02-28 09:12:38 INFO (MainThread) [supervisor.host.manager] Host information reload completed
24-02-28 09:42:10 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
24-02-28 09:51:48 INFO (MainThread) [supervisor.store.git] Update add-on https://github.com/esphome/home-assistant-addon repository
24-02-28 09:51:48 INFO (MainThread) [supervisor.store.git] Update add-on https://github.com/home-assistant/addons repository
24-02-28 09:51:48 INFO (MainThread) [supervisor.store.git] Update add-on https://github.com/hassio-addons/repository repository
24-02-28 09:51:48 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
24-02-28 09:51:49 INFO (MainThread) [supervisor.store] Loading add-ons from store: 77 all - 0 new - 0 remove
24-02-28 09:51:49 INFO (MainThread) [supervisor.store] Loading add-ons from store: 77 all - 0 new - 0 remove
24-02-28 10:00:45 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
24-02-28 10:00:45 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
24-02-28 10:00:45 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
24-02-28 10:00:45 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
24-02-28 10:00:45 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None
24-02-28 10:00:45 WARNING (MainThread) [supervisor.api.ingress] No valid ingress session None

Anything in the add-on logs that might be useful for us?

[09:11:58] INFO: The otbr-web is disabled.
s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service banner: starting
s6-rc: info: service mdns successfully started
[09:11:58] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Feb 17 2024 11:16:43) starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started

-----------------------------------------------------------
 Add-on: OpenThread Border Router
 OpenThread Border Router add-on
-----------------------------------------------------------
 Add-on version: 2.4.7
 You are running the latest version of this add-on.
 System: Home Assistant OS 12.0  (amd64 / qemux86-64)
 Home Assistant Core: 2024.2.4
 Home Assistant Supervisor: 2024.02.0
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[09:11:58] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[09:11:58] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-28 09:11:59.070 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-28 09:11:59.070 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-28 09:12:01.073 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.SPINEL at 460800 baud
2024-02-28 09:12:05.381 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud

Additional information

No response

agners commented 8 months ago

Hm, so this is the last line of the startup log?

2024-02-28 09:12:05.381 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud

Sounds like the universal-silabs-flasher is hanging then :thinking: Is this reproducible, as in when you restart the add-on it hangs on the same line and CPU goes to 100% on one CPU?

@puddly do you happen to have an idea?

kluner commented 8 months ago

Hi,

Yeah, that was the last line.

I had already rebooted my HA VM, but no to no avail, it spun up to 100% (1 of 4 cores assigned to it) right away after HA had finished starting.

I disabled the addon for now, but let me check again.

kluner commented 8 months ago

Check, I just spun up the addon again, and it immediately goes to 100%


Add-on version: 2.4.7
 You are running the latest version of this add-on.
 System: Home Assistant OS 12.0  (amd64 / qemux86-64)
 Home Assistant Core: 2024.2.4
 Home Assistant Supervisor: 2024.02.0
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[13:26:39] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[13:26:39] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-29 13:26:39.551 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-29 13:26:39.551 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-29 13:26:41.556 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.SPINEL at 460800 baud
2024-02-29 13:26:45.866 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud
puddly commented 8 months ago

The flasher has explicit timeouts for every CPC command so it should never stall. The only thing I can imagine is if it's being overwhelmed by a continuous stream of data, which I think I've seen happen once.

Do we have a way to enable verbose logging for the flasher within the addon?

agners commented 8 months ago

@puddly I don't think so. :cry:

@kluner what happens if you remove the stick at that point?

agners commented 8 months ago

@kluner from the system console (I assume you have access to it on your VM), can you use login to get OS shell access, then use top, press f, s, q to sort by CPU usage, and verify that it is indeed the universal-silabs-flasher appearing at the top?

kluner commented 8 months ago

@kluner from the system console (I assume you have access to it on your VM), can you use login to get OS shell access, then use top, press f, s, q to sort by CPU usage, and verify that it is indeed the universal-silabs-flasher appearing at the top?

yep, confirmed:

image

kluner commented 8 months ago

@puddly I don't think so. 😢

@kluner what happens if you remove the stick at that point?

no change. I had a tail on the docker container (docker logs -f ) and it did not show anything, the flasher is still doing 100%.

the kernel spewed 2 lines of USB device error due to the unplug, and that's it.

agners commented 8 months ago

Can you try run it manually with verbose option?

docker exec -it addon_core_openthread_border_router /bin/bash
kill $(pidof python3)
universal-silabs-flasher --verbose --device /dev/ttyUSB0 flash --ensure-exact-version --allow-cross-flashing --firmware "/root/NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl"
kluner commented 8 months ago

and unfortunately no strace in haos, or I could have a peak at what is happening at the syscall level.

kluner commented 8 months ago

Can you try run it manually with verbose option?

docker exec -it addon_core_openthread_border_router /bin/bash
kill $(pidof python3)
universal-silabs-flasher --verbose --device /dev/ttyUSB0 flash --ensure-exact-version --allow-cross-flashing --firmware "/root/NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl"

yeah, but give me a bit to see if I can get a real ssh into haos going, so I can copy/paste.

agners commented 8 months ago

yeah, but give me a bit to see if I can get a real ssh into haos going, so I can copy/paste.

:+1: , fwiw, there is a guide how to do this in our developer docs: https://developers.home-assistant.io/docs/operating-system/debugging#ssh-access-to-the-host

kluner commented 8 months ago

oooh, great suggestion.

ok, so I half anticipate that kill on python would kill the whole container and cause it to restart, but it seems something did happen:

s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[19:55:15] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[19:55:15] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-29 19:55:15.766 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-29 19:55:15.766 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-29 19:55:18.388 homeassistant universal_silabs_flasher.flasher INFO Detected bootloader version '2.1.1'
2024-02-29 19:55:18.389 homeassistant universal_silabs_flasher.flasher INFO Detected ApplicationType.GECKO_BOOTLOADER, version '2.1.1' at 115200 baudrate (bootloader baudrate 115200)
2024-02-29 19:55:18.389 homeassistant universal_silabs_flasher.flash INFO Firmware baudrate 115200 differs from expected baudrate 460800
NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/gecko_bootloader.py", line 71, in probe
    return await self.ebl_info()
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/gecko_bootloader.py", line 81, in ebl_info
    await self._state_machine.wait_for_state(State.IN_MENU)
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/common.py", line 115, in wait_for_state
    return await future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/universal-silabs-flasher", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/flash.py", line 40, in inner
    return asyncio.run(f(*args, **kwargs))
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/flash.py", line 423, in flash
    await flasher.flash_firmware(
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/flasher.py", line 289, in flash_firmware
    await gecko.probe()
  File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/gecko_bootloader.py", line 71, in probe
    return await self.ebl_info()
  File "/usr/local/lib/python3.9/dist-packages/async_timeout/__init__.py", line 141, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.9/dist-packages/async_timeout/__init__.py", line 228, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
s6-rc: warning: unable to start service universal-silabs-flasher: command exited 1
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service banner: stopping
s6-rc: info: service mdns: stopping
s6-rc: info: service banner successfully stopped
Default: mDNSResponder (Engineering Build) (Feb 17 2024 11:16:43) stopping
[19:55:20] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service mdns successfully stopped
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
kluner commented 8 months ago

followed by

[19:56:32] INFO: The otbr-web is disabled.
s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service mdns successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service banner: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
[19:56:32] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Feb 17 2024 11:16:43) starting

-----------------------------------------------------------
 Add-on: OpenThread Border Router
 OpenThread Border Router add-on
-----------------------------------------------------------
 Add-on version: 2.4.7
 You are running the latest version of this add-on.
 System: Home Assistant OS 12.0  (amd64 / qemux86-64)
 Home Assistant Core: 2024.2.4
 Home Assistant Supervisor: 2024.02.1
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[19:56:33] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[19:56:33] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-29 19:56:33.342 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-29 19:56:33.342 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-29 19:56:34.358 homeassistant universal_silabs_flasher.flasher INFO Detected bootloader version '2.1.1'
2024-02-29 19:56:34.358 homeassistant universal_silabs_flasher.flasher INFO Detected ApplicationType.GECKO_BOOTLOADER, version '2.1.1' at 115200 baudrate (bootloader baudrate 115200)
2024-02-29 19:56:34.358 homeassistant universal_silabs_flasher.flash INFO Firmware baudrate 115200 differs from expected baudrate 460800
NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl

which seems to run stable currently, at no excessive cpu consumption.

kluner commented 8 months ago

possible (likely) side-effect: zigbee integration is stuck initialising. let's see if I can shake it loose.

agners commented 8 months ago

Uh, do you use the same device in the ZHA integration maybe? :thinking: The ZHA integration should not point to the that serial port. This would create havoc.

We currently discover the device still as ZHA device, but that will change in the future. Currently you have to explicitly ignore the discovered ZHA entry (see https://skyconnect.home-assistant.io/procedures/enable-thread/).

kluner commented 8 months ago

Uh, do you use the same device in the ZHA integration maybe? 🤔 The ZHA integration should not point to the that serial port. This would create havoc.

We currently discover the device still as ZHA device, but that will change in the future. Currently you have to explicitly ignore the discovered ZHA entry (see https://skyconnect.home-assistant.io/procedures/enable-thread/).

You know, that would make sense in it causing absolute chaos. But no, I use the socket in ZHA.

IMG_1102

kluner commented 8 months ago

btw, it came back. It’s doing 100% again.

agners commented 8 months ago

Do you have the Silicon Labs Multiprotocol add-on enabled at the same time? If that accesses the serial port at the same time it would explain the problem as well...

kluner commented 8 months ago

Actually,I do.

Funny thing: I have had this configuration since I got the skyconnect in. It never caused problems.

So what is the recommended configuration here?

Only multi, and matter server and ZHA to handle the protocol stacks? I do not have matter things at the moment, but I do expect that to change shortly.

agners commented 8 months ago

Unfortunately, the Silicon Labs Multiprotocol add-on showed problems for a lot of folks especially when they started to add devices to the Thread side :cry:

So currently we only recommend dedicated setups: Use a radio for Zigbee and one radio for Thread. Maybe you already have a second radio available? :thinking:

Alternatively, if you have Google or Apple BR, our Matter stack can make use of those as well (the devices still will be directly associated with Home Assistant on the application/Matter level).

kluner commented 8 months ago

Well, I can easily turn off all the thread and matter stuff currently, it's not really getting used. It's just an obstacle for the future.

What do you mean with using the Apple BR? The HomeKit bridge? Can the matter addon use that? That would kinda fix the whole problem anyway.

agners commented 8 months ago

What do you mean with using the Apple BR? The HomeKit bridge? Can the matter addon use that? That would kinda fix the whole problem anyway.

No, HomePod and such, see https://www.home-assistant.io/integrations/thread#list-of-thread-border-router-devices.

kluner commented 8 months ago

ah, right check. Well I was looking for an excuse to buy a matter capable one anyway. ;-)

I reset the firmware on skyconnect to zigbee, and hooked ZHA to it again. Seems to work fine again now.