Closed OlivierGre closed 3 months ago
I used to see the similar 'netif' logs on a well-functioning BR so it may just be a false alarm. I was planning to fix the logging but haven't got some time.
To confirm this, you may just try listing the IP addresses on wpan0
(ifconfig wpan0
) and compare it with ot-ctl ipaddr
. If they have the same set of addresses, then it's fine.
I didn't succeed to get the OTBR running
The log file looks normal so far. Could you elaborate how you noticed the failure? Did the otbr-agent
service fail to start? Or you observed disconnectivity of devices?
The OTBR agent is not working properly, that why I have looked at the syslog. For example this command is failing:
pi@raspberrypi:~ $ sudo ot-ctl dataset active
Error 23: NotFound
In my previous message, I had filtered the syslog to keep only the things related to the otbr-agent. Here is the full log. syslog.log
I can see the following errors:
Jun 6 14:54:41 raspberrypi lightdm[714]: Error getting user list from org.freedesktop.Accounts: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Accounts was not provided by any .service files
Jun 6 14:54:41 raspberrypi tayga[768]: error: unexpectedly disconnected from boot status daemon
Here is the otbr-agent status
pi@raspberrypi:~ $ sudo service otbr-agent status
● otbr-agent.service - OpenThread Border Router Agent
Loaded: loaded (/lib/systemd/system/otbr-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-06-06 14:54:41 CEST; 12min ago
Process: 429 ExecStartPre=/usr/sbin/service mdns start (code=exited, status=0/SUCCESS)
Main PID: 772 (otbr-agent)
Tasks: 1 (limit: 3933)
CPU: 432ms
CGroup: /system.slice/otbr-agent.service
└─772 /usr/sbin/otbr-agent -I wpan0 -B wlan0 spinel+hdlc+uart:///dev/ttyUSB0 trel://wlan0
Jun 06 14:54:42 raspberrypi otbr-agent[772]: 00:00:00.296 [I] RouterTable---: Route table
Jun 06 14:54:42 raspberrypi otbr-agent[772]: 00:00:00.300 [I] Platform------: [netif] Host netif is down
Jun 06 14:54:42 raspberrypi otbr-agent[772]: 00:00:00.300 [I] Platform------: [netif] Succeeded to process request#1
Jun 06 14:54:42 raspberrypi otbr-agent[772]: 00:00:00.300 [W] Platform------: [netif] Failed to process request#2: No such process
Jun 06 14:54:42 raspberrypi otbr-agent[772]: [INFO]-MDNS----: Received reply for service OpenThread BorderRouter #25A0._meshcop._>
Jun 06 14:54:42 raspberrypi otbr-agent[772]: [INFO]-MDNS----: Successfully registered service OpenThread BorderRouter #25A0._mesh>
Jun 06 14:54:42 raspberrypi otbr-agent[772]: [INFO]-BA------: Result of publish meshcop service OpenThread BorderRouter #25A0._me>
Jun 06 14:54:42 raspberrypi otbr-agent[772]: [INFO]-BA------: Result of publish meshcop service OpenThread BorderRouter #25A0._me>
Jun 06 14:54:46 raspberrypi otbr-agent[772]: 00:00:04.952 [I] InfraIf-------: State changed: NOT RUNNING -> RUNNING
Jun 06 15:02:21 raspberrypi otbr-agent[772]: 00:07:21.141 [I] Platform------: Session socket is ready
What's the output of ot-ctl ifconfig
and ot-ctl state
?
pi@raspberrypi:~ $ sudo ot-ctl dataset active Error 23: NotFound
From this log it indicates that you haven't set active dataset. The logs it also indicates your OTBR didn't attempt to join any network.
You can either:
ot-ctl dataset set active <dataset hex>
ot-ctl dataset init new
ot-ctl dataset commit active
Please also ensure you have started thread by ot-ctl ifconfig up
and ot-ctl thread start
.
pi@raspberrypi:~ $ sudo ot-ctl ifconfig
up
Done
pi@raspberrypi:~ $ sudo ot-ctl state
detached
Done
I have a script which configures the dataset:
#!/bin/bash
sudo ot-ctl thread stop
sudo ot-ctl ifconfig down
sudo ot-ctl channel 15
sudo ot-ctl extpanid #################
sudo ot-ctl networkkey #################
sudo ot-ctl networkname #################
sudo ot-ctl panid #################
sudo ot-ctl pskc #################
sudo ot-ctl ifconfig up
sudo ot-ctl thread start
echo "Please wait..."
echo ""
sleep 5
sudo ot-ctl dataset active
echo ""
sudo ot-ctl netdata show
Do I miss some dataset command to set this as the active dataset? Thanks
pi@raspberrypi:~ $ sudo ot-ctl dataset init new
Done
pi@raspberrypi:~ $
pi@raspberrypi:~ $ sudo ot-ctl dataset commit active
Done
Then "sudo ot-ctl dataset active" is working but is not using my settings.
If I call my script after that, the commands are OK but I still get "Error 23: NotFound" when I ask to display the active dataset.
I probably have missunderstood something about the use of this active dataset. As indicated on my first message, I seldom have to look at the OTBR.
Thread requires configuring the full Operational Dataset to work properly.
See README_DATASET.md for more information on how to properly configure the Operational Dataset.
I don't know why what we were doing till now to set the dataset is no more working. Nevermind...
I can see that the Active Operational Dataset contains:
Is it possible to init those fields with default values and then only update the ones that we want to force? (channel, extpanid, networkkey, networkname, panid, pskc)
I'm also wondering if a Matter Commissioner needs to know the other fields:
You can initialize the Dataset first and then change specific fields. To initialize:
ot-ctl dataset init new
A Matter Commissioner needs to know the full Operational Dataset. It doesn't really need to know the individual fields, it just needs to have the TLV-encoded Operational Dataset as a binary object, which is what Matter does.
Thank you Jonathan.
A Matter Commissioner needs to know the full Operational Dataset. It doesn't really need to know the individual fields, it just needs to have the TLV-encoded Operational Dataset as a binary object, which is what Matter does.
On Commissioner side, we are setting the individual fields mainly to be able to change of channel (if a channel is too busy). We of course update the OTBR channel accordingly.
By the way, I have activated the "channel monitor" feature on my OTBR but I'm getting strange results. I'm at home today and my OTBR is configured with Channel 15. When I call the "channel monitor" command, it says ch 11 is busy (with a % varying around 16 to 33) and all the other channels (including channel 15) are at 0% busy. I don't understand why and I'm wondering if this monitoring is reliable.
By the way, I have activated the "channel monitor" feature on my OTBR but I'm getting strange results. I'm at home today and my OTBR is configured with Channel 15. When I call the "channel monitor" command, it says ch 11 is busy (with a % varying around 16 to 33) and all the other channels (including channel 15) are at 0% busy. I don't understand why and I'm wondering if this monitoring is reliable.
Channel monitor runs in background taking RSSI samples infrequently and over time builds a picture of noise/inference level on channels. Suggest at least 8 hours of it running. It is intended as a utility tool for debugging.
I can refer you to its related configs and its API documentations which can help you understand what it does better.
A Matter Commissioner needs to know the full Operational Dataset. It doesn't really need to know the individual fields, it just needs to have the TLV-encoded Operational Dataset as a binary object, which is what Matter does.
I'm coming back to this point because it is not clear to me: Matter's CHIPTool application is setting the dataset from the following individual fields:
Screenshot:
During a Commissioning, Chiptool is concatenating those 4 individual fields into a byte array and it is passing it to Matter SDK, which will pass it to the commissioned device. Is it enough to pass those 4 fields to the commissioned device? It seems to be in contradiction with your sentence "A Matter Commissioner needs to know the full Operational Dataset", isn't it? I'm aware that I'm now out of OTBR scope :)
@OlivierGre
Matter's CHIPTool application is setting the dataset from the following individual fields:
This is an implementation detail. Matter Specification defines the Network Commissioning Cluster
. The AddOrUpdateThreadCommand
has an OperationalDataset
field and states:
The encoding for the OperationalDataset field is defined in the Thread specification.
The client SHALL pass the OperationalDataset as an opaque octet string.
Thanks for the explanation. I have opened Thread 1.3 specification to see the content of the OperationalDataset field.
With Chiptool source code open in parallel, I can see what is put in the OperationalDataset TLV and it corresponds to "8.10.1 Network Managements TLVs":
At that stage, it was not obvious to me to understand what "operational dataset" we are talking about. I have looked at the active and pending operational datasets and it didn't match. I think I have finally found the right one. I see it in 8.4.6.2 Secure Out of band Commissioning. 8.4.6.3.3 defines the "Network join request" and it requires the following parameters:
I see it in 8.4.6.2 Secure Out of band Commissioning.
The interfaces here are purely conceptual. Matter does not use this.
As I mentioned, Matter uses the Network Commissioning Cluster
which contains a OperationalDataset
field. That refers to the Active Operational Dataset in the Thread Specification.
As I mentioned, Matter uses the
Network Commissioning Cluster
which contains aOperationalDataset
field. That refers to the Active Operational Dataset in the Thread Specification.
It is getting clearer but some information is still missing:
As you said, Matter uses the Network Commissioning Cluster
which contains a OperationalDataset
field.
The encoding for this OperationalDataset field is defined in the Thread specification.
I'm looking at Thread 1.3 specification to see the content of the Active Operational Dataset. "4.4.23 Active Operational Dataset TLV" indicates to look at "§8.10.1 Network management TLVs" which lists the possible fields. I'm trying to understand which ones are mandatory and which ones are optional. Right now I haven't found this information in Thread specification.
In a previous message, you said that a "Full Operational Dataset" should be transmitted. So I would expect all the TLV fields defined in 8.10.1 are sent but this is not what I observe when I look at the content of Matter Network Commissioning Cluster
sent by ChipTool. I see those 4 fields only:
And I can see that the Matter device (which is receiving this information) successfully joins the Thread network so that's seems sufficient.
Can you please point me the paragraph in Thread specification indicating the fields that should be present in the Active Operational Dataset?
Thank you
While doing some tests, I’m observing something Weird! I have noticed that, if I set some incorrect values for Channel, PAN ID and Extended PAN ID in ChipTool, the Commissioning still works. If I set a wrong Network Key, the commissioning fails (I can see that the device doesn’t succeed to join the network).
Could it be due to the Security Policy used on my Border Router?
On my Border Router, I can see :
Security Policy : 672 onrc 0
@OlivierGre , OpenThread can support commissioning with only the Network Key. In this case, OpenThread will scan across all channels and send request to the broadcast PAN ID. Extended PAN ID and Network Name provide no value during the attach process.
Providing channel is more efficient because the device does not have to scan across channels.
Providing PAN ID is more efficient because the device does not have to send requests to the broadcast PAN ID.
A Thread device must have the full Operational Dataset to fully participate in a Thread network. As a result, when only providing a partial Operational Dataset, the Thread device must first attach to a neighboring Thread device so that it can retrieve the full Operational Dataset. For example, when only providing a partial dataset to a router-capable device, that device cannot serve as a router/leader until it can fully attach to a neighboring Thread device.
However, when providing the full Operational Dataset, then the device can operate as a full router/leader without having to first attach to a neighboring Thread device. This is especially useful in scenarios like replacing a broken device.
Thank you so much Jonathan for this very interesting and useful information!
I feel a bit sorry to ask you some advices but I'm stuck.
I've been working on Thread/Matter for a couple of years and I seldom have to "touch" at the OTBR. Yesterday I have decided to rebuild it to add the option to monitor the Thread channels usage.
Since that time, I fail to get the OTBR up and running again and it is making me crazy!
Yesterday I have changed a few things simultaneously (I was maybe too confident!):
I'm attaching the syslog when the Raspberry Pi is booting and starting the otbr-agent.
Would you know what could cause this error?
Is it normal that netif is down at that time? Thank you, regards
syslog_2024_06_06.log