Closed kluner closed 8 months ago
Hm, so this is the last line of the startup log?
2024-02-28 09:12:05.381 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud
Sounds like the universal-silabs-flasher is hanging then :thinking: Is this reproducible, as in when you restart the add-on it hangs on the same line and CPU goes to 100% on one CPU?
@puddly do you happen to have an idea?
Hi,
Yeah, that was the last line.
I had already rebooted my HA VM, but no to no avail, it spun up to 100% (1 of 4 cores assigned to it) right away after HA had finished starting.
I disabled the addon for now, but let me check again.
Check, I just spun up the addon again, and it immediately goes to 100%
Add-on version: 2.4.7
You are running the latest version of this add-on.
System: Home Assistant OS 12.0 (amd64 / qemux86-64)
Home Assistant Core: 2024.2.4
Home Assistant Supervisor: 2024.02.0
-----------------------------------------------------------
Please, share the above information when looking for help
or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[13:26:39] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[13:26:39] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-29 13:26:39.551 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-29 13:26:39.551 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-29 13:26:41.556 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.SPINEL at 460800 baud
2024-02-29 13:26:45.866 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud
The flasher has explicit timeouts for every CPC command so it should never stall. The only thing I can imagine is if it's being overwhelmed by a continuous stream of data, which I think I've seen happen once.
Do we have a way to enable verbose logging for the flasher within the addon?
@puddly I don't think so. :cry:
@kluner what happens if you remove the stick at that point?
@kluner from the system console (I assume you have access to it on your VM), can you use login
to get OS shell access, then use top
, press f
, s
, q
to sort by CPU usage, and verify that it is indeed the universal-silabs-flasher
appearing at the top?
@kluner from the system console (I assume you have access to it on your VM), can you use
login
to get OS shell access, then usetop
, pressf
,s
,q
to sort by CPU usage, and verify that it is indeed theuniversal-silabs-flasher
appearing at the top?
yep, confirmed:
@puddly I don't think so. 😢
@kluner what happens if you remove the stick at that point?
no change. I had a tail on the docker container (docker logs -f
the kernel spewed 2 lines of USB device error due to the unplug, and that's it.
Can you try run it manually with verbose option?
docker exec -it addon_core_openthread_border_router /bin/bash
kill $(pidof python3)
universal-silabs-flasher --verbose --device /dev/ttyUSB0 flash --ensure-exact-version --allow-cross-flashing --firmware "/root/NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl"
and unfortunately no strace in haos, or I could have a peak at what is happening at the syscall level.
Can you try run it manually with verbose option?
docker exec -it addon_core_openthread_border_router /bin/bash kill $(pidof python3) universal-silabs-flasher --verbose --device /dev/ttyUSB0 flash --ensure-exact-version --allow-cross-flashing --firmware "/root/NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl"
yeah, but give me a bit to see if I can get a real ssh into haos going, so I can copy/paste.
yeah, but give me a bit to see if I can get a real ssh into haos going, so I can copy/paste.
:+1: , fwiw, there is a guide how to do this in our developer docs: https://developers.home-assistant.io/docs/operating-system/debugging#ssh-access-to-the-host
oooh, great suggestion.
ok, so I half anticipate that kill on python would kill the whole container and cause it to restart, but it seems something did happen:
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[19:55:15] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[19:55:15] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-29 19:55:15.766 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-29 19:55:15.766 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-29 19:55:18.388 homeassistant universal_silabs_flasher.flasher INFO Detected bootloader version '2.1.1'
2024-02-29 19:55:18.389 homeassistant universal_silabs_flasher.flasher INFO Detected ApplicationType.GECKO_BOOTLOADER, version '2.1.1' at 115200 baudrate (bootloader baudrate 115200)
2024-02-29 19:55:18.389 homeassistant universal_silabs_flasher.flash INFO Firmware baudrate 115200 differs from expected baudrate 460800
NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/gecko_bootloader.py", line 71, in probe
return await self.ebl_info()
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/gecko_bootloader.py", line 81, in ebl_info
await self._state_machine.wait_for_state(State.IN_MENU)
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/common.py", line 115, in wait_for_state
return await future
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/universal-silabs-flasher", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/flash.py", line 40, in inner
return asyncio.run(f(*args, **kwargs))
File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/flash.py", line 423, in flash
await flasher.flash_firmware(
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/flasher.py", line 289, in flash_firmware
await gecko.probe()
File "/usr/local/lib/python3.9/dist-packages/universal_silabs_flasher/gecko_bootloader.py", line 71, in probe
return await self.ebl_info()
File "/usr/local/lib/python3.9/dist-packages/async_timeout/__init__.py", line 141, in __aexit__
self._do_exit(exc_type)
File "/usr/local/lib/python3.9/dist-packages/async_timeout/__init__.py", line 228, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
s6-rc: warning: unable to start service universal-silabs-flasher: command exited 1
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service banner: stopping
s6-rc: info: service mdns: stopping
s6-rc: info: service banner successfully stopped
Default: mDNSResponder (Engineering Build) (Feb 17 2024 11:16:43) stopping
[19:55:20] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service mdns successfully stopped
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
followed by
[19:56:32] INFO: The otbr-web is disabled.
s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service mdns successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service banner: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
[19:56:32] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Feb 17 2024 11:16:43) starting
-----------------------------------------------------------
Add-on: OpenThread Border Router
OpenThread Border Router add-on
-----------------------------------------------------------
Add-on version: 2.4.7
You are running the latest version of this add-on.
System: Home Assistant OS 12.0 (amd64 / qemux86-64)
Home Assistant Core: 2024.2.4
Home Assistant Supervisor: 2024.02.1
-----------------------------------------------------------
Please, share the above information when looking for help
or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[19:56:33] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[19:56:33] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-02-29 19:56:33.342 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-02-29 19:56:33.342 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-02-29 19:56:34.358 homeassistant universal_silabs_flasher.flasher INFO Detected bootloader version '2.1.1'
2024-02-29 19:56:34.358 homeassistant universal_silabs_flasher.flasher INFO Detected ApplicationType.GECKO_BOOTLOADER, version '2.1.1' at 115200 baudrate (bootloader baudrate 115200)
2024-02-29 19:56:34.358 homeassistant universal_silabs_flasher.flash INFO Firmware baudrate 115200 differs from expected baudrate 460800
NabuCasa_SkyConnect_OpenThread_RCP_v2.4.0.0_ot-rcp_hw_460800.gbl
which seems to run stable currently, at no excessive cpu consumption.
possible (likely) side-effect: zigbee integration is stuck initialising. let's see if I can shake it loose.
Uh, do you use the same device in the ZHA integration maybe? :thinking: The ZHA integration should not point to the that serial port. This would create havoc.
We currently discover the device still as ZHA device, but that will change in the future. Currently you have to explicitly ignore the discovered ZHA entry (see https://skyconnect.home-assistant.io/procedures/enable-thread/).
Uh, do you use the same device in the ZHA integration maybe? 🤔 The ZHA integration should not point to the that serial port. This would create havoc.
We currently discover the device still as ZHA device, but that will change in the future. Currently you have to explicitly ignore the discovered ZHA entry (see https://skyconnect.home-assistant.io/procedures/enable-thread/).
You know, that would make sense in it causing absolute chaos. But no, I use the socket in ZHA.
btw, it came back. It’s doing 100% again.
Do you have the Silicon Labs Multiprotocol add-on enabled at the same time? If that accesses the serial port at the same time it would explain the problem as well...
Actually,I do.
Funny thing: I have had this configuration since I got the skyconnect in. It never caused problems.
So what is the recommended configuration here?
Only multi, and matter server and ZHA to handle the protocol stacks? I do not have matter things at the moment, but I do expect that to change shortly.
Unfortunately, the Silicon Labs Multiprotocol add-on showed problems for a lot of folks especially when they started to add devices to the Thread side :cry:
So currently we only recommend dedicated setups: Use a radio for Zigbee and one radio for Thread. Maybe you already have a second radio available? :thinking:
Alternatively, if you have Google or Apple BR, our Matter stack can make use of those as well (the devices still will be directly associated with Home Assistant on the application/Matter level).
Well, I can easily turn off all the thread and matter stuff currently, it's not really getting used. It's just an obstacle for the future.
What do you mean with using the Apple BR? The HomeKit bridge? Can the matter addon use that? That would kinda fix the whole problem anyway.
What do you mean with using the Apple BR? The HomeKit bridge? Can the matter addon use that? That would kinda fix the whole problem anyway.
No, HomePod and such, see https://www.home-assistant.io/integrations/thread#list-of-thread-border-router-devices.
ah, right check. Well I was looking for an excuse to buy a matter capable one anyway. ;-)
I reset the firmware on skyconnect to zigbee, and hooked ZHA to it again. Seems to work fine again now.
Describe the issue you are experiencing
Since the last update of HAOS and Homeassistant to below versions, the open thread border router add on has started using 100% CPU. Turning overal VM cpu usage from 3% to 27%. (enough to trigger the fans).
logs show no remarkable information:
What type of installation are you running?
Home Assistant OS
Which operating system are you running on?
Home Assistant Operating System
Which add-on are you reporting an issue with?
CEC Scanner
What is the version of the add-on?
2.4.7
Steps to reproduce the issue
System Health information
System Information
Home Assistant Cloud
logged_in | true -- | -- subscription_expiration | 9 January 2025 at 01:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | true remote_server | eu-central-1-13.ui.nabu.casa certificate_status | ready instance_id | 3b1746964c94474db743e9ce4d6d388e can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | okHome Assistant Supervisor
host_os | Home Assistant OS 12.0 -- | -- update_channel | stable supervisor_version | supervisor-2024.02.0 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 30.8 GB disk_used | 8.4 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.9.0), File editor (5.8.0), Glances (0.21.0), Silicon Labs Multiprotocol (2.4.4), Matter Server (5.2.0), OpenThread Border Router (2.4.7), Node-RED (17.0.7)Dashboards
dashboards | 5 -- | -- resources | 0 views | 4 mode | storageRecorder
oldest_recorder_run | 22 February 2024 at 10:39 -- | -- current_recorder_run | 28 February 2024 at 09:11 estimated_db_size | 1329.48 MiB database_engine | sqlite database_version | 3.44.2Anything in the Supervisor logs that might be useful for us?
Anything in the add-on logs that might be useful for us?
Additional information
No response