home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.51k stars 1.47k forks source link

OTBR addon can't connect to network thread radio #3722

Open rodalpho opened 1 month ago

rodalpho commented 1 month ago

Describe the issue you are experiencing

I have a SLZB-06 (not SLZB-06M or any other variant) PoE network-connected device (ie, not USB) successfully flashed with their BTproxy+thread ESP32 firmware from the thread config halfway down on this URL. The BT proxy works great, but I can't get the OTBR addon to connect to this device successfully, it gives timeout errors.

The OTBR addon is configured as follows:

device: /dev/ttyS0
baudrate: "460800"
flow_control: false
autoflash_firmware: false
otbr_log_level: info
firewall: true
nat64: false
network_device: 10.10.20.190:6638

I tried other baudrates and also tried flow control on (the default). It's off above because I was following another git response saying it didn't like flow control, but that didn't fix the issue. My home assistant VM can connect to it on that port.

~ # nmap -p 6638 10.10.20.190
Starting Nmap 7.95 ( https://nmap.org ) at 2024-08-13 13:56 EDT
Nmap scan report for btproxy (10.10.20.190)
Host is up (0.00074s latency).

PORT     STATE SERVICE
6638/tcp open  unknown
MAC Address: 78:E3:6D:E4:21:13 (Espressif)

Nmap done: 1 IP address (1 host up) scanned in 0.13 seconds

Full debug logs from the addon below:

[14:01:28] INFO: The otbr-web is disabled.
[14:01:28] INFO: Enabled socat-otbr-tcp.
s6-rc: info: service socat-otbr-tcp: starting
s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service mdns successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service banner: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
[14:01:28] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Aug  8 2024 07:23:12) starting
s6-rc: info: service legacy-cont-init successfully started
[14:01:28] INFO: Starting socat TCP client for OTBR daemon...

-----------------------------------------------------------
 Add-on: OpenThread Border Router
 OpenThread Border Router add-on
-----------------------------------------------------------
 Add-on version: 2.9.1
 You are running the latest version of this add-on.
s6-rc: info: service socat-otbr-tcp successfully started
 System: Home Assistant OS 12.4  (amd64 / qemux86-64)
 Home Assistant Core: 2024.8.1
 Home Assistant Supervisor: 2024.08.0
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[14:01:29] INFO: Flashing firmware is disabled
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service otbr-agent: starting
[14:01:29] INFO: Setup OTBR firewall...
[14:01:29] INFO: Starting otbr-agent...
tiocmget: Inappropriate ioctl for device
[NOTE]-AGENT---: Running 0.3.0-41474ce-dirty
[NOTE]-AGENT---: Thread version: 1.3.0
[NOTE]-AGENT---: Thread interface: wpan0
[NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///tmp/ttyOTBR?uart-baudrate=460800
[NOTE]-AGENT---: Radio URL: trel://enp0s18
[NOTE]-ILS-----: Infra link selected: enp0s18
[INFO]-NCP-----: OpenThread log level changed to 5
54d.13:54:37.349 [D] P-SpinelDrive-: Sent spinel frame, flg:0x2, iid:0, tid:0, cmd:RESET
54d.13:54:37.349 [D] P-SpinelDrive-: Waiting response: key=0
54d.13:54:39.351 [W] P-SpinelDrive-: Wait for response timeout
54d.13:54:39.351 [I] P-SpinelDrive-: co-processor self reset successfully
54d.13:54:39.351 [D] P-SpinelDrive-: Waiting response: key=1
54d.13:54:41.352 [W] P-SpinelDrive-: Wait for response timeout
54d.13:54:41.352 [D] P-SpinelDrive-: Waiting response: key=1
54d.13:54:43.352 [W] P-SpinelDrive-: Wait for response timeout
54d.13:54:43.352 [C] Platform------: Init() at spinel_driver.cpp:82: Failure
54d.13:54:43.352 [D] P-SpinelDrive-: Waiting response: key=1
54d.13:54:45.354 [W] P-SpinelDrive-: Wait for response timeout
[14:02:03] WARNING: otbr-agent exited with code 1 (by signal 0).
Chain OTBR_FORWARD_INGRESS (0 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             PKTTYPE = unicast
DROP       all  --  anywhere             anywhere             match-set otbr-ingress-deny-src src
ACCEPT     all  --  anywhere             anywhere             match-set otbr-ingress-allow-dst dst
DROP       all  --  anywhere             anywhere             PKTTYPE = unicast
ACCEPT     all  --  anywhere             anywhere            
otbr-ingress-deny-src
otbr-ingress-deny-src-swap
otbr-ingress-allow-dst
otbr-ingress-allow-dst-swap
Chain OTBR_FORWARD_EGRESS (0 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
[14:02:03] INFO: OTBR firewall teardown completed.
s6-svlisten1: fatal: /run/s6-rc/servicedirs/otbr-agent failed permanently or its supervisor died
s6-rc: warning: unable to start service otbr-agent: command exited 1
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service universal-silabs-flasher: stopping
s6-rc: info: service socat-otbr-tcp: stopping
s6-rc: info: service mdns: stopping
s6-rc: info: service universal-silabs-flasher successfully stopped
Default: mDNSResponder (Engineering Build) (Aug  8 2024 07:23:12) stopping
2024/08/13 14:02:03 socat[81] W exiting on signal 15
s6-rc: info: service banner: stopping
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service banner successfully stopped
s6-rc: info: service socat-otbr-tcp successfully stopped
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[14:02:03] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service mdns successfully stopped

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

OpenThread Border Router

What is the version of the add-on?

2.9.1

Steps to reproduce the issue

See above

System Health information

System Information

version core-2024.8.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.4
os_name Linux
os_version 6.6.33-haos
arch x86_64
timezone America/New_York
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1390 Downloaded Repositories | 22 HACS Data | ok
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.4 -- | -- update_channel | stable supervisor_version | supervisor-2024.08.0 agent_version | 1.6.0 docker_version | 26.1.4 disk_total | 30.8 GB disk_used | 13.0 GB healthy | true supported | true host_connectivity | true supervisor_connectivity | true ntp_synchronized | true virtualization | kvm board | ova supervisor_api | ok version_api | ok installed_addons | Terminal & SSH (9.14.0), Studio Code Server (5.15.0), Samba Backup (5.2.0), Advanced SSH & Web Terminal (18.0.0), ESPHome (2024.7.3), Cloudflared (5.1.17), Samba share (12.3.2), Portainer Agent (linux-ppc64le-2.20.3-alpine), Home Assistant Google Drive Backup (0.112.1), Scrypted (v0.114.0-jammy-full), Matter Server (6.4.1), OpenThread Border Router (2.9.1)
Dashboards dashboards | 4 -- | -- resources | 12 views | 9 mode | storage
Recorder oldest_recorder_run | August 3, 2024 at 6:27 PM -- | -- current_recorder_run | August 13, 2024 at 1:02 PM estimated_db_size | 163.50 MiB database_engine | sqlite database_version | 3.45.3

Anything in the Supervisor logs that might be useful for us?

Not really, but just in case:

2024-08-13 14:06:58.925 INFO (SyncWorker_3) [supervisor.docker.manager] Cleaning addon_core_openthread_border_router application
2024-08-13 14:06:59.178 INFO (MainThread) [supervisor.docker.addon] Starting Docker add-on homeassistant/amd64-addon-otbr with version 2.9.1
2024-08-13 14:07:00.292 INFO (MainThread) [supervisor.api.middleware.security] /network/info access from core_openthread_border_router
2024-08-13 14:07:12.082 ERROR (MainThread) [asyncio] Task exception was never retrieved
future: <Task finished name='Task-405950' coro=<Addon.watchdog_container() done, defined at /usr/src/supervisor/supervisor/addons/addon.py:1429> exception=AddonsJobError('Rate limit exceeded, more than 10 calls in 0:30:00')>
Traceback (most recent call last):
  File "/usr/src/supervisor/supervisor/addons/addon.py", line 1443, in watchdog_container
    await self._restart_after_problem(event.state)
  File "/usr/src/supervisor/supervisor/jobs/decorator.py", line 290, in wrapper
    raise on_condition(
supervisor.exceptions.AddonsJobError: Rate limit exceeded, more than 10 calls in 0:30:00
2024-08-13 14:08:12.561 ERROR (SyncWorker_1) [supervisor.docker.manager] Container addon_core_openthread_border_router is not running

### Anything in the add-on logs that might be useful for us?

```txt
found in description above

Additional information

No response

agners commented 1 month ago

This could be triggered by the timing sensitivity of the RCP protocol (as mentioned back when this feature got introduced, see https://github.com/home-assistant/addons/pull/3532#issuecomment-2076781028).

@tl-sl thoughts? Have you seen this on your end?

tl-sl commented 1 month ago

I have never tested this config via esphome myself (but I assume others must have). However usually the timeout errors indicate a failure to connect at all, which can often be caused by mismatched baudrates or some other network issue.

@rodalpho I suggest you double check that the uart baudrate is set to 460800 in Esphome config. Just in case make sure there are no Wifi links between HA server and SLZB-06. Finally can you try and test with the latest version of stock SMLIGHT firmware just to rule out any issues with the Esphome serial bridge.

rodalpho commented 1 month ago

That may actually be the issue-- checking their ESP config from the URL I linked above it specifies 115200 baud. So it looks like they provided an incorrect config. Unfortunately their flasher no longer works now so I'll have to contact the vendor. Will update if that fixes the issue.

Edit: I managed to get it to flash via the esptool.py CLI with the custom firmware generated from that YAML file with changed baudrate, but no connectivity. My guess is nobody has ever used this before and it's just broken. Guess I'll stick with a BT proxy unless they respond to my support request. Either way, not your problem!


## MDNS service settings
mdns:
  services:
    - service: "_slzb-06"
      protocol: "_tcp"
      port: 6638
      txt:
        version: 1.0
        name: SMLIGHT SLZB-06
        radio_type: znp
        baud_rate: 115200
        data_flow_control: software
Tarik2142 commented 4 weeks ago

That may actually be the issue-- checking their ESP config from the URL I linked above it specifies 115200 baud. So it looks like they provided an incorrect config. Unfortunately their flasher no longer works now so I'll have to contact the vendor. Will update if that fixes the issue.

Edit: I managed to get it to flash via the esptool.py CLI with the custom firmware generated from that YAML file with changed baudrate, but no connectivity. My guess is nobody has ever used this before and it's just broken. Guess I'll stick with a BT proxy unless they respond to my support request. Either way, not your problem!


## MDNS service settings
mdns:
  services:
    - service: "_slzb-06"
      protocol: "_tcp"
      port: 6638
      txt:
        version: 1.0
        name: SMLIGHT SLZB-06
        radio_type: znp
        baud_rate: 115200
        data_flow_control: software

You are looking at the MDNS settings but they won't work for the OTBR addon because it doesn't support them. UART settings above in config image

rodalpho commented 4 weeks ago

Ahh. So that wasn’t the problem then?

Tarik2142 commented 4 weeks ago

Ahh. So that wasn’t the problem then?

I think that you need to connect the OTBR addon in network mode, not USB

rodalpho commented 4 weeks ago

I did, as specified in my post opening this issue. If I configured it wrong would appreciate a correction.

Tarik2142 commented 4 weeks ago

I did, as specified in my post opening this issue. If I configured it wrong would appreciate a correction.

Did you change the mode to Thread before flashing the BT Proxy?

rodalpho commented 4 weeks ago

There's no way to do that, the default config is Zigbee-only, no thread or BT proxy. I followed their exact instructions to flash new firmware from the site I linked up top.

tl-sl commented 4 weeks ago

@rodalpho Did you flash the CC2652 Zigbee chip with the Thread firmware before installing ESPHome on ESP32? It can't work if thread firmware hasnt been installed.

rodalpho commented 4 weeks ago

I followed their instructions linked below (and in the OP), which did not tell me to flash anything separately. I have no way to go back to the stock firmware as their web flasher doesn't work, nor does the normal esphome web flasher, only the esptool.py on Linux actually flashed anything, and I can't find the stock firmware file anywhere.

I get the feeling I'm the first person to ever actually do this, maybe outside of internal testing, and it isn't really supported. It's all kinda janky as the home assistant ESPhome addon can't adopt it properly with an error on their th-bt github respository, but at least it works well as a PoE-connected bluetooth proxy.

They haven't responded to my support request yet but to be fair, the company is located in Ukraine.

https://smlight.tech/manual/slzb-06/guide/bluetooth-proxy/

tl-sl commented 4 weeks ago

Thread support is still fairly new, so maybe that step is missing from their instructions!

The web flasher should still work even with Esphome installed, see the FAQ section if its not working. https://smlight.tech/flasher/#SLZB-06 Failing that you can try install this firmware with esptool then update OTA. https://github.com/smlight-dev/slzb-06-firmware/releases

They haven't responded to my support request yet but to be fair, the company is located in Ukraine.

yet they have responded in this thread

rodalpho commented 4 weeks ago

Ahh didn't realize you worked there, thank you!

Web flasher through USB definitely didn't work, and that was using the same cable that worked with esptool. I followed the FAQ, other than the steps requiring cracking the case open. Updated drivers, held the tiny button down when booting, held it down while I clicked flash, etc.

Thanks for the link to stock firmware-- suggest documenting that on the site, or making it more clear if it's there already and I missed it. I'll sit down tomorrow, grab that, flash thread to the zigbee chip, then flash the ESP32 and cross my fingers.

Edit: Ahh I see that's the old open-source firmware, I didn't think of trying that and was looking for the newest one, 2.3.6.

3oris commented 2 weeks ago

(For the use case of matter:) If only SL provided an otbr-agent to run on their stick! We could just add the thing to our preferred network and operate stuff on the "right" layer instead of wrapping a lower OSI layer (with everything on top) into a higher one... (Also, obviously not considering multi-protocol here)