openthread / ot-br-posix

OpenThread Border Router, a Thread border router for POSIX-based platforms.
https://openthread.io/
BSD 3-Clause "New" or "Revised" License
403 stars 227 forks source link

Border router disappears randomly from home assistant thread network #2216

Closed paolofaz closed 4 months ago

paolofaz commented 6 months ago

I'm trying the last official openthread docker image with a sonoff-e dongle flashed with the last openthread fw (2.4.1)

I've activated the otbr integration in homeassistant (http://my-ip:8081) and it connects well.

The Thread integration is auto discovered from home assistant and it creates a network with a name like "ha-thread-XXXX" as you can see in the image below:

1

After a few minutes (random, about 5), the border router disappears from the thread network with this error "no border routers were found" as you can see in the image below:

2

If i "click" on "reset boarder router", home assistant makes a new thread network with another name (ha-thread-YYYY) and after a few minutes i get the same problem: the border router disappares again etc etc etc

The border router web page is always reachable.

The logs doesnt help beacuse i dont see any error.

this is my compose yaml

` otbr: container_name: otbr image: openthread/otbr:latest ports:

networks: rete_otbr: driver: bridge driver_opts: com.docker.network.bridge.name: "otbr0" enable_ipv6: true ipam: config:

agners commented 6 months ago

The border router list shown in Home Assistant is based on mDNS/DNS-SD _meshcop._udp service. Usually, the timeout is 30 min or so, so if the OTBR/mDNSResponder crashes hard, then it would disappear after that period. But if the border router disappears after a few minutes already, it sounds more it would gracefully announce a service remove.

Anything in the logs of the otbr container (docker logs otbr)?

paolofaz commented 6 months ago

@agners i copy only the relevants sections of otbr container logs ` ... ++ RESOLV_CONF_HEAD=/etc/resolvconf/resolv.conf.d/head

### and then... board router get out at 08:27:25, this is logs:

Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.595 [D] P-RadioSpinel-: ... csmaCaEnabled:1, isHeaderUpdated:0, isARetx:0, skipAes:0, txDelay:0, txDelayBase:0 Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.602 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:12, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.602 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:cec9, ecn:no, to:0xffff, sec:no, prio:net Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.603 [I] MeshForwarder-: src:[fe80:0:0:0:b8db:8a8:e4a6:6dc7]:19788 Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.603 [I] MeshForwarder-: dst:[ff02:0:0:0:0:0:0:1]:19788 Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.384 [I] Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1) Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.384 [D] P-RadioSpinel-: Sent spinel frame, flg:0x2, iid:0, tid:13, cmd:PROP_VALUE_SET, key:STREAM_RAW, len:69, channel:15, maxbackoffs:4, maxretries:15 ... Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.384 [D] P-RadioSpinel-: ... csmaCaEnabled:1, isHeaderUpdated:0, isARetx:0, skipAes:0, txDelay:0, txDelayBase:0 Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:13, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:db5f, ecn:no, to:0xffff, sec:no, prio:net Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [I] MeshForwarder-: src:[fe80:0:0:0:b8db:8a8:e4a6:6dc7]:19788 Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [I] MeshForwarder-: dst:[ff02:0:0:0:0:0:0:1]:19788 `

agners commented 6 months ago

Uh, I am a bit confused, is that container running mDNSResponder and avahi-daemon at the same time :thinking:

It does seem to me that something with the OTBR container is wrong, as it has this Withdrawing messages. Is maybe that bridge configuration causing problems?

It doesn't look like the official docs on how to use the Docker container use a bridged setup, maybe the container is not designed to be used that way?

I'd suggest to monitor/track mDNS announcements from the outside to see if there are remove announcements. That would explain the behavior you see in Home Assistant.

Then the question becomes why is the OTBR sending remove announcements?

If you want a easy and just works setup: I can recommend using Home Assistant OS. It offers the OpenThread border router add-on, which is built from upstream OpenThread repositories (see https://github.com/home-assistant/addons/tree/master/openthread_border_router). The whole stack of Home Assistant OS + Home Assistant Core + OTBR add-on is well tested and known to be working well.

lineumaciel commented 6 months ago

It all depends on the type of your installation. Can you write something more. You have typical symptoms for OTBR in bridge mode and Home Assitant in host mode. In such a configuration you need to build a completely new OTBR image. What on the host is responsible for mdns?

Without major problems, OTBR works in host mode but, as I mentioned, you need to prepare it for this. Below you will find a dockerfile to build OTBR in host mode. I personally use Avahi instead of mDNSResponder.

My Dockerfile.

ARG BASE_IMAGE=ubuntu:bionic FROM ${BASE_IMAGE}

ARG INFRA_IF_NAME ARG BORDER_ROUTING ARG BACKBONE_ROUTER ARG OT_BACKBONE_CI ARG OTBR_OPTIONS ARG DNS64 ARG NAT64 ARG NAT64_SERVICE ARG NAT64_DYNAMIC_POOL ARG REFERENCE_DEVICE ARG RELEASE ARG REST_API ARG WEB_GUI ARG MDNS ARG FIREWALL

ENV INFRA_IF_NAME=${INFRA_IF_NAME:-eth0} ENV BORDER_ROUTING=${BORDER_ROUTING:-1} ENV BACKBONE_ROUTER=${BACKBONE_ROUTER:-1} ENV OT_BACKBONE_CI=${OT_BACKBONE_CI:-0} ENV OTBR_MDNS=${MDNS:-avahi} ENV OTBR_OPTIONS=${OTBR_OPTIONS:-"-DOT_THREAD_VERSION=1.3 -DOT_FULL_LOGS=ON -DOT_DUA=ON -DOT_MLR=ON -DOTBR_DBUS=OFF -DOTBR_TREL=ON -DOT_DIAGNOSTIC=1 -DOT_LINK_RAW=1 -DOTBR_VENDOR_NAME=HomeAssistant -DOTBR_PRODUCT_NAME=OpenThreadBorderRouter -DBUILD_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/usr -DOTBR_FEATURE_FLAGS=ON -DOTBR_DNSSD_DISCOVERY_PROXY=ON -DOTBR_SRP_ADVERTISING_PROXY=ON -DOTBR_MDNS=avahi -DOTBR_WEB=ON -DOTBR_BORDER_ROUTING=ON -DOTBR_REST=ON -DOTBR_BACKBONE_ROUTER=ON -DOTBR_NAT64=ON -DOT_POSIX_NAT64_CIDR="192.168.255.0/24" -DOTBR_DNS_UPSTREAM_QUERY=ON -DOT_CHANNEL_MONITOR=ON -DOT_COAP=OFF -DOT_COAPS=OFF"} ENV DEBIAN_FRONTEND noninteractive ENV PLATFORM ubuntu ENV REFERENCE_DEVICE=${REFERENCE_DEVICE:-0} ENV RELEASE=${RELEASE:-1} ENV NAT64=${NAT64:-1} ENV NAT64_SERVICE=${NAT64_SERVICE:-openthread} ENV NAT64_DYNAMIC_POOL=${NAT64_DYNAMIC_POOL:-192.168.255.0/24} ENV DNS64=${DNS64:-0} ENV WEB_GUI=${WEB_GUI:-1} ENV REST_API=${REST_API:-1} ENV FIREWALL=${FIREWALL:-1} ENV DOCKER 1

RUN env

ENV OTBR_DOCKER_REQS sudo python3

ENV OTBR_DOCKER_DEPS git ca-certificates

ENV OTBR_BUILD_DEPS apt-utils build-essential psmisc ninja-build cmake wget ca-certificates \ libreadline-dev libncurses-dev libcpputest-dev libdbus-1-dev libavahi-common-dev \ libavahi-client-dev libboost-dev libboost-filesystem-dev libboost-system-dev \ libnetfilter-queue-dev

ENV OTBR_OT_BACKBONE_CI_DEPS curl lcov wget build-essential python3-dbus python3-zeroconf

ENV OTBR_NORELEASE_DEPS \ cpputest-dev

RUN apt-get update \ && apt-get install --no-install-recommends -y $OTBR_DOCKER_REQS $OTBR_DOCKER_DEPS \ && ([ "${OT_BACKBONE_CI}" != "1" ] || apt-get install --no-install-recommends -y $OTBR_OT_BACKBONE_CI_DEPS) \ && ln -fs /usr/share/zoneinfo/UTC /etc/localtime

COPY ./script /app/script COPY ./third_party/mDNSResponder /app/third_party/mDNSResponder WORKDIR /app

RUN ./script/bootstrap COPY . . RUN ./script/setup

RUN ([ "${DNS64}" = "0" ] || chmod 644 /etc/bind/named.conf.options) \ && ([ "${OT_BACKBONE_CI}" = "1" ] || ( \ mv ./script /tmp \ && mv ./etc /tmp \ && find . -delete \ && rm -rf /usr/include \ && mv /tmp/script . \ && mv /tmp/etc . \ && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false $OTBR_DOCKER_DEPS \ && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false $OTBR_BUILD_DEPS \ && ([ "${RELEASE}" = 1 ] || apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false "$OTBR_NORELEASE_DEPS";) \ && rm -rf /var/lib/apt/lists/ \ && rm -rf /tmp/ \ ))

ENTRYPOINT ["/app/etc/docker/docker_entrypoint.sh"]

EXPOSE 80

My Docker Compose.

version: "3.4" services: openthread_border_router: container_name: openthread image: openthread-trel-test:latest #openthread/otbr-trel restart: unless-stopped network_mode: host privileged: true devices:

image

image

image

jwhui commented 4 months ago

Closing stale issue.