home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.57k stars 1.51k forks source link

OTBR: 2.9.0 breaks nrf RPC spinel radio #3683

Open Teagan42 opened 4 months ago

Teagan42 commented 4 months ago

Describe the issue you are experiencing

I have been successfully utilizing a nrf52840 MDK USB dongle, flashed with OTBR RPC firmware using west/zephyr as described by NRF52/openthread.io documentation https://openthread.io/codelabs/openthread-hardware#1, for the last year. Upon upgrading the OTBR addon to 2.9.0, the OTBR agent fails to start due to spinel_driver error/timeout (see addon logs). I was able to work around this breaking change by downloading the 2.8.0 OTBR addon and installing it as a local addon.

I also tried using a spare SkyConnect dongle (configured the add-on to flash firmware), which resulted in the same errors.

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

OpenThread Border Router

What is the version of the add-on?

2.9.0

Steps to reproduce the issue

  1. Flash OTBR RPC firmware to nrf52xxx device
  2. Install OTBR addon version 2.9.0
  3. Configure OTBR addon to use this serial device
  4. Start OTBR addon ..

System Health information

System Information

version core-2024.7.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.4
os_name Linux
os_version 6.6.33-haos
arch x86_64
timezone America/Denver
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4999 Installed Version | 1.34.0 Stage | running Available Repositories | 1398 Downloaded Repositories | 52 HACS Data | ok
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.4 -- | -- update_channel | stable supervisor_version | supervisor-2024.06.2 agent_version | 1.6.0 docker_version | 26.1.4 disk_total | 110.5 GB disk_used | 52.2 GB healthy | true supported | true host_connectivity | true supervisor_connectivity | true ntp_synchronized | true virtualization | kvm board | ova supervisor_api | ok version_api | ok installed_addons | Whisper (2.1.2), Piper (1.5.0), Advanced SSH & Web Terminal (18.0.0), ESPHome (2024.6.6), Studio Code Server (5.15.0), openWakeWord (1.10.0), Matter Server (6.2.1), EMQX (0.6.0), OpenThread Border Router (2.8.0)
Dashboards dashboards | 6 -- | -- resources | 34 views | 15 mode | storage
Recorder oldest_recorder_run | June 5, 2024 at 5:41 PM -- | -- current_recorder_run | July 6, 2024 at 2:33 AM estimated_db_size | 8841.11 MiB database_engine | postgresql database_version | 16.2

Anything in the Supervisor logs that might be useful for us?

No response

Anything in the add-on logs that might be useful for us?

-----------------------------------------------------------
 Add-on: OpenThread Border Router
 OpenThread Border Router add-on
-----------------------------------------------------------
 Add-on version: 2.9.0
 You are running the latest version of this add-on.
 System: Home Assistant OS 12.4  (amd64 / qemux86-64)
 Home Assistant Core: 2024.7.1
 Home Assistant Supervisor: 2024.06.2
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[03:00:25] INFO: Flashing firmware is disabled
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service otbr-agent: starting
[03:00:26] INFO: Starting otbr-agent...
[NOTE]-AGENT---: Running 0.3.0-41474ce-dirty
[NOTE]-AGENT---: Thread version: 1.3.0
[NOTE]-AGENT---: Thread interface: wpan0
[NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///dev/ttyACM0?uart-baudrate=115200
[NOTE]-AGENT---: Radio URL: trel://enp6s18
[NOTE]-ILS-----: Infra link selected: enp6s18
52d.23:00:28.002 [W] P-SpinelDrive-: Wait for response timeout
52d.23:00:30.004 [W] P-SpinelDrive-: Wait for response timeout
52d.23:00:32.006 [W] P-SpinelDrive-: Wait for response timeout
52d.23:00:32.006 [C] Platform------: Init() at spinel_driver.cpp:82: Failure
52d.23:00:34.009 [W] P-SpinelDrive-: Wait for response timeout
[03:01:05] WARNING: otbr-agent exited with code 1 (by signal 0).
[03:01:05] INFO: OTBR firewall teardown completed.
s6-svlisten1: fatal: /run/s6-rc/servicedirs/otbr-agent failed permanently or its supervisor died
s6-rc: warning: unable to start service otbr-agent: command exited 1
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service universal-silabs-flasher: stopping
s6-rc: info: service mdns: stopping
Default: mDNSResponder (Engineering Build) (Jul  2 2024 09:04:20) stopping
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service universal-silabs-flasher successfully stopped
s6-rc: info: service banner: stopping
[03:01:05] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service banner successfully stopped
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service mdns successfully stopped
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
-----------------------------------------------------------
 Add-on: OpenThread Border Router
 OpenThread Border Router add-on
-----------------------------------------------------------
 Add-on version: 2.8.0
 You are running the latest version of this add-on.
 System: Home Assistant OS 12.4  (amd64 / qemux86-64)
 Home Assistant Core: 2024.7.1
 Home Assistant Supervisor: 2024.06.2
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[03:00:25] INFO: Flashing firmware is disabled
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service otbr-agent: starting
[04:08:18] INFO: Setup OTBR firewall...
[04:08:18] INFO: Starting otbr-agent...
[NOTE]-AGENT---: Running 0.3.0-41474ce-dirty
[NOTE]-AGENT---: Thread version: 1.3.0
[NOTE]-AGENT---: Thread interface: wpan0
[NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///dev/ttyACM0?uart-baudrate=115200
[NOTE]-AGENT---: Radio URL: trel://enp6s18
[NOTE]-ILS-----: Infra link selected: enp6s18
[INFO]-NCP-----: OpenThread log level changed to 5
53d.00:08:17.817 [D] P-SpinelDrive-: Sent spinel frame, flg:0x2, iid:0, tid:0, cmd:RESET
53d.00:08:17.817 [D] P-SpinelDrive-: Waiting response: key=0
53d.00:08:17.827 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:0, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:RESET_POWER_ON
53d.00:08:17.827 [I] P-SpinelDrive-: co-processor reset: RESET_POWER_ON
53d.00:08:17.827 [C] P-SpinelDrive-: Software reset co-processor successfully
53d.00:08:17.827 [D] P-SpinelDrive-: Waiting response: key=1
53d.00:08:17.827 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:1, cmd:PROP_VALUE_IS, key:PROTOCOL_VERSION, major:4, minor:3
53d.00:08:17.827 [D] P-SpinelDrive-: Waiting response: key=2
53d.00:08:17.828 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:1, cmd:PROP_VALUE_IS, key:NCP_VERSION, version:OPENTHREAD/thread-reference-20230119-545-g992be2781; NRF52840; Aug  3 2023 01:02:17
53d.00:08:17.828 [D] P-SpinelDrive-: Waiting response: key=5
53d.00:08:17.828 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:1, cmd:PROP_VALUE_IS, key:CAPS, caps:COUNTERS UNSOL_UPDATE_FILTER 802_15_4_2450MHZ_OQPSK CONFIG_RADIO MAC_RAW RCP_API_VERSION RCP_MIN_HOST_API_VERSION 
53d.00:08:17.829 [D] P-RadioSpinel-: Wait response: tid=1 key=8
53d.00:08:17.829 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:1, cmd:PROP_VALUE_IS, key:HWADDR, eui64:f4ce365868d42a60
53d.00:08:17.829 [D] P-RadioSpinel-: Wait response: tid=2 key=176
53d.00:08:17.830 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:2, cmd:PROP_VALUE_IS, key:RCP_API_VERSION, version:9
53d.00:08:17.830 [D] P-RadioSpinel-: Wait response: tid=3 key=177
53d.00:08:17.830 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:3, cmd:PROP_VALUE_IS, key:RCP_MIN_HOST_API_VERSION, min-host-version:4
53d.00:08:17.830 [D] P-RadioSpinel-: Wait response: tid=4 key=4619
53d.00:08:17.830 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:4, cmd:PROP_VALUE_IS, key:RADIO_CAPS, caps:255
53d.00:08:17.830 [D] P-Trel--------: platformTrelInit(aTrelUrl:"trel://enp6s18")
[DEBG]-TrelDns-: Initialized on netif "enp6s18"
[DEBG]-TrelDns-: Netif enp6s18 is ready: index = 2
53d.00:08:17.831 [I] P-Netif-------: Sent request#1 to set addr_gen_mode to 1
00:00:00.000 [D] P-RadioSpinel-: Wait response: tid=5 key=34
00:00:00.001 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:5, cmd:PROP_VALUE_IS, key:PHY_CHAN_SUPPORTED, channelMask:0x07fff800
00:00:00.001 [D] P-RadioSpinel-: Wait response: tid=6 key=32
00:00:00.001 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:6, cmd:PROP_VALUE_IS, key:PHY_ENABLED, enabled:1
00:00:00.001 [D] P-RadioSpinel-: Wait response: tid=7 key=54
00:00:00.001 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:7, cmd:PROP_VALUE_IS, key:MAC_15_4_PANID, panid:0xffff
00:00:00.001 [D] P-RadioSpinel-: Wait response: tid=8 key=53
00:00:00.002 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:8, cmd:PROP_VALUE_IS, key:MAC_15_4_SADDR, saddr:0x0000
00:00:00.002 [D] P-RadioSpinel-: Wait response: tid=9 key=39
00:00:00.002 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:9, cmd:PROP_VALUE_IS, key:PHY_RX_SENSITIVITY, sensitivity:-100
00:00:00.002 [D] P-RadioSpinel-: Wait response: tid=10 key=2048
00:00:00.002 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:10, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.002 [D] P-RadioSpinel-: Wait response: tid=11 key=52
00:00:00.003 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:11, cmd:PROP_VALUE_IS, key:MAC_15_4_LADDR, laddr:8a3e7f2a66308cca
00:00:00.003 [D] P-RadioSpinel-: Wait response: tid=12 key=53
00:00:00.003 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:12, cmd:PROP_VALUE_IS, key:MAC_15_4_SADDR, saddr:0xfffe
00:00:00.003 [D] P-RadioSpinel-: Wait response: tid=13 key=4868
00:00:00.003 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:13, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.003 [D] P-RadioSpinel-: Wait response: tid=14 key=4869
00:00:00.004 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:14, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.004 [I] ChildSupervsn-: Timeout: 0 -> 190
00:00:00.004 [D] P-RadioSpinel-: Wait response: tid=15 key=2048
00:00:00.005 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:15, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.005 [D] P-RadioSpinel-: Wait response: tid=1 key=54
00:00:00.005 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:1, cmd:PROP_VALUE_IS, key:MAC_15_4_PANID, panid:0xd8cf
00:00:00.005 [I] Settings------: Read NetworkInfo {rloc:0xf400, extaddr:823092acf5b4adac, role:leader, mode:0x0f, version:4, keyseq:0x9, ...
00:00:00.005 [I] Settings------: ... pid:0x4cdd0775, mlecntr:0x3e9, maccntr:0x3e8, mliid:b331dd61476553b7}
00:00:00.005 [D] P-RadioSpinel-: Wait response: tid=2 key=2048
00:00:00.005 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:2, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.006 [D] P-RadioSpinel-: Wait response: tid=3 key=2049
00:00:00.006 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:3, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.006 [D] P-RadioSpinel-: Wait response: tid=4 key=2049
00:00:00.006 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:4, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
00:00:00.006 [D] P-RadioSpinel-: Wait response: tid=5 key=53
00:00:00.006 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:5, cmd:PROP_VALUE_IS, key:MAC_15_4_SADDR, saddr:0xf400
00:00:00.006 [D] P-RadioSpinel-: Wait response: tid=6 key=52
00:00:00.007 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:6, cmd:PROP_VALUE_IS, key:MAC_15_4_LADDR, laddr:823092acf5b4adac
00:00:00.007 [D] P-Trel--------: PrepareSocket()
[DEBG]-TrelDns-: Start browsing _trel._udp services ...
00:00:00.007 [I] TrelInterface-: Enabled interface, local port:50576
00:00:00.007 [I] RoutingManager: Initializing - InfraIfIndex:2
00:00:00.007 [I] InfraIf-------: Init infra netif 2
00:00:00.007 [I] Settings------: Read BrUlaPrefix fd8c:39b:a267::/48
00:00:00.007 [N] RoutingManager: BR ULA prefix: fd8c:39b:a267::/48 (loaded)
00:00:00.007 [I] RoutingManager: Generated local OMR prefix: fd8c:39b:a267:1::/64
00:00:00.007 [I] RoutingManager: Generated local NAT64 prefix: fd8c:39b:a267:2:0:0::/96
00:00:00.007 [N] RoutingManager: Local on-link prefix: fd63:75ee:6746:c180::/64
00:00:00.007 [I] InfraIf-------: State changed: NOT RUNNING -> RUNNING
00:00:00.007 [I] RoutingManager: Enabling
00:00:00.007 [I] Nat64---------: IPv4 CIDR for NAT64: 192.168.255.0/24 (actual address pool: 192.168.255.1 - 192.168.255.254, 254 addresses)
00:00:00.007 [I] Nat64---------: NAT64 translator is now Disabled
00:00:00.007 [I] P-Resolver----: Got nameserver #0: 172.30.32.3
[INFO]-UTILS---: Set state callback: OK
[DEBG]-TrelDns-: Stop browsing _trel._udp service.
[DEBG]-TrelDns-: mDNS Publisher is Ready
[INFO]-TrelDns-: TREL DNS-SD Is Now Ready: Netif=enp6s18(2), SubscriberId=0, Register=!
[INFO]-BA------: Start Thread Border Agent
[INFO]-BA------: Publish meshcop service Home Assistant OpenThread Border Router #ADAC._meshcop._udp.local.
00:00:00.008 [I] Settings------: Read BorderAgentId {id:40bc64a2dfef1ec6774b377c89bb57f5}
[INFO]-MDNS----: Registering service Home Assistant OpenThread Border Router #ADAC._meshcop._udp
00:00:00.008 [I] BbrLocal------: Add Domain Prefix: ::/0, NotFound
00:00:00.008 [I] BbrLocal------: Add BBR Service: seqno (89), delay (5s), timeout (3600s), InvalidState
[INFO]-ADPROXY-: Started
[INFO]-DPROXY--: Started
[INFO]-APP-----: Thread Border Router started on AIL enp6s18.

Additional information

No response

rwkeane commented 4 months ago

I received the same error (Platform------: Init() at spinel_driver.cpp:82: Failure) when running natively on an RPI (tried with both RPI3 and RPI4) with linux when using the SkyConnect stick flashed with the OTBR firmware (i.e. not on Home Assistant) . So I suspect the issue might be the firmware on the stick rather than something HA side

Was there a new version of the firmware released recently? It could be a firmware bug if its not OS specific - I'd be happy to load an old version of the firmware and give it a try if you give me a link to where I can download it

akarabach commented 4 months ago

I have the same issue, but I'm using OTBR in docker

AdriSchmi commented 4 months ago

I have this issue since today too. All my Matter devices are offline

eground2000 commented 4 months ago

I have the same issue. I use a SLZB-06M [Kernel Version: 2.3.6 | Radio Version: 20241105 | Matter-over-Thread (Beta)]


----------------------------------------------------------- Add-on: OpenThread Border Router OpenThread Border Router add-on ----------------------------------------------------------- Add-on version: 2.9.0 You are running the latest version of this add-on. s6-rc: info: service socat-otbr-tcp successfully started System: Home Assistant OS 12.4 (amd64 / qemux86-64) Home Assistant Core: 2024.7.3 Home Assistant Supervisor: 2024.06.2 ----------------------------------------------------------- Please, share the above information when looking for help or support in, e.g., GitHub, forums or the Discord chat. ----------------------------------------------------------- s6-rc: info: service banner successfully started s6-rc: info: service universal-silabs-flasher: starting [15:23:02] INFO: Flashing firmware is disabled s6-rc: info: service universal-silabs-flasher successfully started s6-rc: info: service otbr-agent: starting [15:23:02] INFO: Starting otbr-agent... tiocmget: Inappropriate ioctl for device [NOTE]-AGENT---: Running 0.3.0-41474ce-dirty [NOTE]-AGENT---: Thread version: 1.3.0 [NOTE]-AGENT---: Thread interface: wpan0 [NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///tmp/ttyOTBR?uart-baudrate=460800 [NOTE]-AGENT---: Radio URL: trel://enp2s1 [NOTE]-ILS-----: Infra link selected: enp2s1 [INFO]-NCP-----: OpenThread log level changed to 5 53d.19:34:45.686 [D] P-SpinelDrive-: Sent spinel frame, flg:0x2, iid:0, tid:0, cmd:RESET 53d.19:34:45.686 [D] P-SpinelDrive-: Waiting response: key=0 53d.19:34:47.688 [W] P-SpinelDrive-: Wait for response timeout 53d.19:34:47.688 [I] P-SpinelDrive-: co-processor self reset successfully 53d.19:34:47.688 [D] P-SpinelDrive-: Waiting response: key=1 53d.19:34:49.690 [W] P-SpinelDrive-: Wait for response timeout 53d.19:34:49.690 [D] P-SpinelDrive-: Waiting response: key=1 53d.19:34:51.692 [W] P-SpinelDrive-: Wait for response timeout 53d.19:34:51.692 [C] Platform------: Init() at spinel_driver.cpp:82: Failure 53d.19:34:51.692 [D] P-SpinelDrive-: Waiting response: key=1 53d.19:34:53.694 [W] P-SpinelDrive-: Wait for response timeout [15:23:10] WARNING: otbr-agent exited with code 1 (by signal 0). [15:23:10] INFO: OTBR firewall teardown completed. s6-svlisten1: fatal: /run/s6-rc/servicedirs/otbr-agent failed permanently or its supervisor died s6-rc: warning: unable to start service otbr-agent: command exited 1 s6-rc: info: service legacy-cont-init: stopping s6-rc: info: service universal-silabs-flasher: stopping s6-rc: info: service socat-otbr-tcp: stopping s6-rc: info: service mdns: stopping /run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information. /run/s6/basedir/scripts/rc.init: fatal: stopping the container. s6-rc: info: service universal-silabs-flasher successfully stopped s6-rc: info: service banner: stopping 2024/07/23 15:23:10 socat[74] W exiting on signal 15 Default: mDNSResponder (Engineering Build) (Jul 2 2024 09:04:20) stopping s6-rc: info: service socat-otbr-tcp successfully stopped s6-rc: info: service banner successfully stopped s6-rc: info: service legacy-cont-init successfully stopped s6-rc: info: service fix-attrs: stopping s6-rc: info: service fix-attrs successfully stopped s6-rc: info: service s6rc-oneshot-runner: stopping s6-rc: info: service s6rc-oneshot-runner successfully stopped [15:23:10] INFO: mDNS ended with exit code 4 (signal 0)... s6-rc: info: service mdns successfully stopped`

rwkeane commented 4 months ago

So after digging through the manufacturer documentation, I think I figured it out!

When the baud rate is set to the default, I get this error. After updating it to 460800, the error resolved itself

akarabach commented 4 months ago

So after digging through the manufacturer documentation, I think I figured it out!

When the baud rate is set to the default, I get this error. After updating it to 460800, the error resolved itself

didnt help for me

AdriSchmi commented 4 months ago

So after digging through the manufacturer documentation, I think I figured it out!

When the baud rate is set to the default, I get this error. After updating it to 460800, the error resolved itself

me neither

Electron1007 commented 4 months ago

I have the same issue and proposed solution didn't help

Benschi99 commented 4 months ago

I have the same error!

Is there any way to downgrade to 2.8.0. without a backup? Sadly I just did a fresh installation of HA.

Leah-Anderson commented 3 months ago

I just wanted to throw this here because I came across the same thing running the openthread container with the nrf52840 MDK. The fix for me was to add "uart-baudrate=1000000&uart-flow-control". Credit goes to this issue: https://github.com/openthread/openthread/issues/10538.

tl-sl commented 3 months ago

I also tried using a spare SkyConnect dongle (configured the add-on to flash firmware), which resulted in the same errors.

SkyConnect requires you enable flow control. It appears that nRf dongle may also require this.

The fix for me was to add "uart-baudrate=1000000&uart-flow-control"

For those using the addon, just enable the flow control button in configuration.

IngmarStein commented 3 months ago

As of recently, I found myself in the same situation after it has been working fine for a while.

After a reboot of my Synology NAS, the serial USB driver for a ZBT-1 is correctly loaded:

$ dmesg | grep -Fi usb
[52258.716119] usb 1-3: cp210I x converter now attached to ttyUSB0
$ lsusb -cui
|__usb1          1d6b:0002:0404 09  2.00  480MBit/s 0mA 1IF  (Linux 4.4.302+ xhci-hcd xHCI Host Controller 0000:09:00.3) hub
  |__1-1         f400:f400:0100 00  2.00  480MBit/s 200mA 1IF  (Synology DiskStation 650026E98013F214)
  1-1:1.0         (IF) 08:06:50 2EPs () usb-storage host7 (synoboot)
  |__1-3         10c4:ea60:0100 00  2.00   12MBit/s 100mA 1IF  (Nabu Casa SkyConnect v1.0 8eb63fd4089bed1184547708a8669f5d)
  1-3:1.0         (IF) ff:00:00 2EPs () cp210x ttyUSB0 

This device is attached to a virtual machine running HAOS with the otbr addon. When I start OTRB, I get the Init() at spinel_driver.cpp:82: Failure error:

[13:55:48] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-08-16 13:55:48.915 homeassistant universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.4.0', ezsp_version=None, ot_rcp_version='SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0), cpc_version=None, fw_type=<FirmwareImageType.OT_RCP: 'ot-rcp'>, baudrate=460800)
2024-08-16 13:55:48.915 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-08-16 13:55:50.958 homeassistant universal_silabs_flasher.flasher INFO Probing ApplicationType.SPINEL at 460800 baud
2024-08-16 13:55:50.999 homeassistant universal_silabs_flasher.flasher INFO Detected ApplicationType.SPINEL, version 'SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0) at 460800 baudrate (bootloader baudrate None)
2024-08-16 13:55:50.999 homeassistant universal_silabs_flasher.flash INFO Firmware version 'SL-OPENTHREAD/2.4.0.0_GitHub-7074a43e4' (2.4.0.0) is flashed, not re-installing
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service otbr-agent: starting
[13:55:51] INFO: Setup OTBR firewall...
[13:55:51] INFO: Starting otbr-agent...
[NOTE]-AGENT---: Running 0.3.0-41474ce-dirty
[NOTE]-AGENT---: Thread version: 1.3.0
[NOTE]-AGENT---: Thread interface: wpan0
[NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///dev/ttyUSB0?uart-baudrate=115200&uart-flow-control
[NOTE]-AGENT---: Radio URL: trel://enp0s3
[NOTE]-ILS-----: Infra link selected: enp0s3
49d.20:26:11.560 [W] P-SpinelDrive-: Wait for response timeout
49d.20:26:13.562 [W] P-SpinelDrive-: Wait for response timeout
49d.20:26:15.565 [W] P-SpinelDrive-: Wait for response timeout
49d.20:26:15.565 [C] Platform------: Init() at spinel_driver.cpp:82: Failure
49d.20:26:17.567 [W] P-SpinelDrive-: Wait for response timeout
[13:55:59] WARNING: otbr-agent exited with code 1 (by signal 0).

This seems to cause a USB disconnect:

$ dmesg | grep -Fi usb
[52300.582799] cp210x ttyUSB0: cp210x converter now disconnected from ttyUSB0

And the serial module is no longer used for the ZBT-1 (changed to usbfs):

$ lsusb -cui
|__usb1          1d6b:0002:0404 09  2.00  480MBit/s 0mA 1IF  (Linux 4.4.302+ xhci-hcd xHCI Host Controller 0000:09:00.3) hub
  |__1-1         f400:f400:0100 00  2.00  480MBit/s 200mA 1IF  (Synology DiskStation 650026E98013F214)
  1-1:1.0         (IF) 08:06:50 2EPs () usb-storage host7 (synoboot)
  |__1-3         10c4:ea60:0100 00  2.00   12MBit/s 100mA 1IF  (Nabu Casa SkyConnect v1.0 8eb63fd4089bed1184547708a8669f5d)
  1-3:1.0         (IF) ff:00:00 2EPs () usbfs 
IngmarStein commented 3 months ago

I was able to work around the issue by setting autoflash_firmware: false.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

akarabach commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Still a problem

oilid commented 1 month ago

I also tried using a spare SkyConnect dongle (configured the add-on to flash firmware), which resulted in the same errors.

SkyConnect requires you enable flow control. It appears that nRf dongle may also require this.

The fix for me was to add "uart-baudrate=1000000&uart-flow-control"

For those using the addon, just enable the flow control button in configuration.

Thanks a lot, this helped me. I'm using a slzb-06 with Matter-over-Thread (beta). Before enabling the setting "Hardware flow control" after few restarts the OTBR always stopped running with the last message "service mdns successfully stopped" in the logs. But enabling "Hardware flow control" helps although the official documentation on https://smlight.tech/manual/slzb-06/guide/thread-matter/ recommend:

Hardware flow control: click No.

I don't know why?!

Update: After few minutes it stopped working again... :-(

00:00:00.092 [N] RoutingManager: BR ULA prefix: fd06:befa:3b94::/48 (loaded)
00:00:00.092 [N] RoutingManager: Local on-link prefix: fd9a:1d1f:969e:48fa::/64
00:00:00.130 [N] Mle-----------: Role disabled -> detached
00:00:00.163 [N] P-Netif-------: Changing interface state to up.
00:00:00.186 [W] P-Netif-------: Failed to process request#2: No such process
00:00:00.187 [W] P-Netif-------: Failed to process request#6: No such process
s6-rc: info: service otbr-agent successfully started
s6-rc: info: service otbr-agent-rest-discovery: starting
s6-rc: info: service otbr-agent-configure: starting
00:00:00.307 [N] Mle-----------: Role detached -> router
00:00:00.307 [N] Mle-----------: Partition ID 0x3ed00eba
00:00:00.349 [W] P-Netif-------: Successfully added an external route fd9c:1945:69a7:ffff:0:0::/96 in kernel
00:00:00.349 [W] P-Netif-------: Successfully added an external route ::/0 in kernel
Done

s6-rc: info: service otbr-agent-configure successfully started
[13:42:13] INFO: Successfully sent discovery information to Home Assistant.
s6-rc: info: service otbr-agent-rest-discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
00:00:02.088 [W] DuaManager----: Failed to perform next registration: NotFound
00:00:15.241 [W] P-HdlcIntface-: Error decoding hdlc frame: Parse
00:00:21.624 [N] MeshForwarder-: Dropping rx frag frame, error:Drop, len:88, src:7235eb0cb7769c4b, dst:0xffff, sec:yes, tag:62139, offset:128, dglen:233
00:00:23.122 [W] Mle-----------: Failed to process Data Request: Security
00:00:29.319 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:00:45.191 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:00:53.920 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:01:15.541 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:01:38.228 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:01:52.152 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:02:00.843 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:02:48.150 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:03:01.541 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:03:29.480 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:03:52.257 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:04:16.531 [W] Mle-----------: Failed to process Child ID Request: Security
00:04:29.199 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:04:41.474 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:05:27.860 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:06:05.291 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:06:36.883 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:07:17.512 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:07:45.585 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:07:53.765 [W] Mle-----------: Failed to process Child ID Request: Security
00:08:01.730 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:08:23.766 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:08:53.292 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:09:16.496 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:09:30.305 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:09:57.751 [W] Mle-----------: Failed to process Child ID Request: Security
00:10:09.792 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:10:24.510 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:10:32.426 [N] Mle-----------: Different partition (peer:1100402483, local:1053822650)
00:11:01.014 [N] Mle-----------: Different partition (peer:823648658, local:1053822650)
00:11:04.539 [W] P-HdlcIntface-: Error decoding hdlc frame: Parse
00:11:09.432 [W] P-RadioSpinel-: radio tx timeout
00:11:09.432 [C] P-RadioSpinel-: Failed to communicate with RCP - no response from RCP during initialization
00:11:09.432 [C] P-RadioSpinel-: This is not a bug and typically due a config error (wrong URL parameters) or bad RCP image:
00:11:09.432 [C] P-RadioSpinel-: - Make sure RCP is running the correct firmware
00:11:09.432 [C] P-RadioSpinel-: - Double check the config parameters passed as `RadioURL` input
00:11:09.432 [C] Platform------: HandleRcpTimeout() at radio_spinel.cpp:1991: RadioSpinelNoResponse
[13:53:22] WARNING: otbr-agent exited with code 6 (by signal 0).
Chain OTBR_FORWARD_INGRESS (0 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             PKTTYPE = unicast
DROP       all  --  anywhere             anywhere             match-set otbr-ingress-deny-src src
ACCEPT     all  --  anywhere             anywhere             match-set otbr-ingress-allow-dst dst
DROP       all  --  anywhere             anywhere             PKTTYPE = unicast
ACCEPT     all  --  anywhere             anywhere            
otbr-ingress-deny-src
otbr-ingress-deny-src-swap
otbr-ingress-allow-dst
otbr-ingress-allow-dst-swap
Chain OTBR_FORWARD_EGRESS (0 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
[13:53:22] INFO: OTBR firewall teardown completed.
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service otbr-agent-rest-discovery: stopping
s6-rc: info: service otbr-agent-configure: stopping
s6-rc: info: service otbr-agent-rest-discovery successfully stopped
s6-rc: info: service otbr-agent-configure successfully stopped
s6-rc: info: service otbr-agent: stopping
s6-rc: info: service otbr-agent successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service universal-silabs-flasher: stopping
s6-rc: info: service socat-otbr-tcp: stopping
s6-rc: info: service mdns: stopping
s6-rc: info: service universal-silabs-flasher successfully stopped
Default: mDNSResponder (Engineering Build) (Oct  7 2024 16:49:24) stopping
2024/10/08 13:53:22 socat[79] W exiting on signal 15
s6-rc: info: service banner: stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service socat-otbr-tcp successfully stopped
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[13:53:22] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service mdns successfully stopped
alexpfau commented 4 weeks ago

Same issue here with SLZB-06M on Matter-over-Thread firmware

oilid commented 4 weeks ago

after the upgrade from 20240401 to thread firmware 20240727 it works now for me!!!

akarabach commented 4 weeks ago

Same issue here with SLZB-06M on Matter-over-Thread firmware

I have SLZB-06 and it works smoothly, may you share how have you configured it ? You need to follow official guide, connect the stick via usb to the HA instance, connect stick to ethernet, configure OTBR to work over ethernet (connect stick directly to HA via usb is required and important for stability, there were a bunch of issues and OTBR shutdown for me)

akarabach commented 4 weeks ago

It works for me for 20240401 as well

tl-sl commented 4 weeks ago

If you are having issues with SMLIGHT devices please file a new issue, rather than hijacking an unrelated issue.