openthread / ot-br-posix

OpenThread Border Router, a Thread border router for POSIX-based platforms.
https://openthread.io/
BSD 3-Clause "New" or "Revised" License
397 stars 225 forks source link

Has anyone enabled ot-br-posix / openthread in Android Pie ? #1739

Open AlanLCollins opened 1 year ago

AlanLCollins commented 1 year ago

I have a Border router on Android Pie. All Link-local messages can be Tx and Rx with no issues. but when it comes to a routable address I get unreachable network error from the libc sockets sendmsg (). (especulation starts here >> ) I am following a clue that adding new routes might be a process that crosses the boundaries between system and vendor partitions, which normally requires a HIDL interface. Please share your thoughts/ideas on this matter. I do get the routes using:

ip route list table all | grep ot0                                                                                                        
fd1b:991:d321:2730::/64 dev ot0 table 1011 proto kernel metric 256 pref medium
fd51:3770:df33:1::/64 dev ot0 table 1011 proto kernel metric 256 pref medium
fe80::/64 dev ot0 table 1011 proto kernel metric 256 pref medium
ff00::/8 dev ot0 table local metric 256 pref medium

but ip -6 route show returns empty. So I tried adding routes manually, but still get the 101 error from sendmsg(). ip -6 route add fdde:ad00:beef::/64 dev ot0 metric 1 I might be using incorrect approach, or perhaps I am following the incorrect clue.

wgtdkp commented 1 year ago

First of all, there lacks native support for Thread on Android. But the good news, we are working on it.

  1. I suppose you are hacking with Android and trying to send a message from the Android host to the Thread device attached to this Android host, right?
  2. I suppose ot0 is the Thread tunnel interface, so can you ping a fd1b:991:d321:2730::/64?

Switch to root user before continuing with below commands

adb root
adb shell

but ip -6 route show returns empty. So I tried adding routes manually, but still get the 101 error from sendmsg(). ip -6 route add fdde:ad00:beef::/64 dev ot0 metric 1

This shows the routes in the main routing table but Android uses policy routing to manage multiple networks and the main is never used.

If ot0 is the Thread tunnel interface, then 1011 is the routing table for Thread. You should add your routes to table 1011:

# In other cases (maybe other Android versions), the table ID will be just the interface name
ip -6 route add fdde:ad00:beef::/64 dev ot0 metric 1 table 1011

And check the routing table and make sure our route is there:

ip -6 route show table 1011

Check if the routing table 1011 is added to the routing policy rules:

ip -6 rule

Most important, make sure there is a rule from all iif lo oif ot0 lookup 1011 , add it if none:

# Maybe change the priority (the smaller the higher priority)
ip -6 rule add from all iif lo oif ot0 lookup 1011 priority 17000

You can check the selected route for a destination address with below command:

ip -6 route get <the-address-you-want-to-sendmsg-to>

Make sure it is via dev ot0 table 1011...

Now you should be able to ping your Thread mesh-local address.

AlanLCollins commented 1 year ago

hello @wgtdkp , thank you for the guidance. now I can send SRP server responses. I added the ip rule as you suggested. Now, I am having trouble with OMR address. Those are getting routed to the wifi interface instead of ot0. Your assumptions are correct, Android Pie host runs a service using otbr-agent bringing up ot0 as thread interface which connects to RCP over serial interface.

service otservice /system/bin/otbr-agent -I ot0 -B wlan0 spinel+hdlc+uart://${mux.ttyOpenThread}

these are the addresses the Android host has:

/ # ot-ctl ipaddr
fd3a:303:44bf:1:3aec:f3a9:9c34:4ec3
fdde:ad00:beef:0:0:ff:fe00:fc10
fdde:ad00:beef:0:0:ff:fe00:fc00
fdde:ad00:beef:0:0:ff:fe00:9800
fdde:ad00:beef:0:2524:155b:766c:9f4e
fe80:0:0:0:a8a3:2da:b276:bd64
Done

The routes:

/ # ip -6 route show table 1011
fd3a:303:44bf:1::/64 dev ot0 proto kernel metric 256 pref medium
fdde:ad00:beef::/64 dev ot0 proto kernel metric 256 pref medium
fe80::/64 dev ot0 proto kernel metric 256 pref medium

the rules (I manually added the 17000 entry)

/ # ip -6 rule | grep -E "wlan0|ot0"                                                                                                                                                                       
10500:  from all iif lo oif wlan0 uidrange 0-0 lookup wlan0 
13000:  from all fwmark 0x10066/0x1ffff iif lo lookup wlan0 
14000:  from all iif lo oif wlan0 lookup wlan0 
17000:  from all iif lo oif ot0 lookup 1011 
19000:  from all fwmark 0x66/0x1ffff iif lo lookup wlan0 
22000:  from all fwmark 0x0/0xffff iif lo lookup wlan0 

A CHILD joins the network with the following addresses:

[2023-02-04 18:09:40] D: 72524 [DL]   Device Role: CHILD
[2023-02-04 18:09:40] D: 72527 [DL]   Network Name: hello
[2023-02-04 18:09:40] D: 72530 [DL]   PAN Id: 0x1234
[2023-02-04 18:09:40] D: 72533 [DL]   Extended PAN Id: 0x1122334455667788
[2023-02-04 18:09:40] D: 72538 [DL]   Channel: 11
[2023-02-04 18:09:40] D: 72540 [DL]   Mesh Prefix: fdde:ad00:beef::/64
[2023-02-04 18:09:40] D: 72545 [DL]   Partition Id: 0x361BC9D3
[2023-02-04 18:09:40] D: 72550 [DL]OpenThread State Changed (Flags: 0x00000001)
[2023-02-04 18:09:40] D: 72555 [DL]   Thread Unicast Addresses:
[2023-02-04 18:09:40] D: 72559 [DL]        fd3a:303:44bf:1:e4b4:66da:4205:cf50/64 valid preferred
[2023-02-04 18:09:40] D: 72566 [DL]        fdde:ad00:beef::ff:fe00:9802/64 valid rloc
[2023-02-04 18:09:40] D: 72572 [DL]        fdde:ad00:beef:0:b371:446a:8dbb:ebc2/64 valid
[2023-02-04 18:09:40] D: 72578 [DL]        fe80::9818:13cb:c2c5:961d/64 valid preferred

Extended discovery fails when trying to resolve the OMR addr. From the IP rules, It looks that Android host is sending the message over wifi interface. I will run few sniffer captures (ot0 and wlan0) to confirm. I want to check if any OTA CoAP activity to map EID-to-RLOC.

/ # ip -6 route get fd3a:303:44bf:1:e4b4:66da:4205:cf50                                                                                                                  
fd3a:303:44bf:1:e4b4:66da:4205:cf50 from :: via fe80::c294:35ff:fe58:2982 dev wlan0 table wlan0 proto ra src 2601:647:5b80:59d0:58fa:2d0c:fb0e:a66c metric 1024 hoplimit 64 pref medium
wgtdkp commented 1 year ago
  1. Please share the routes in the Wi-Fi routing table:
    ip -6 route show table wlan0
  2. Try use a lower priority for the Thread routing table - you need to prioritize the Thread routing table before the wifi routing table because the wifi network may be the default network which has default route. So try use 13000 for the Thread routing table (the wifi routing table has priority 14000):
    ip -6 rule add from all iif lo oif ot0 lookup 1011 priority 13000
wgtdkp commented 1 year ago

Extended discovery fails when trying to resolve the OMR addr.

Is this the DNS-SD service discovery? Please share the failure log

AlanLCollins commented 1 year ago

This is a Matter CASE session establishment between the Android host and a Matter Light-bulb. The Android host is both Matter controller and Thread BR, queries for Matter._tcp operational service. (see log attached) CASE_Sigma1_fails.txt

The mDNS query gets resolved with OMR - fd3a:303:44bf:1:656a:6592:f4d0:86ea

02-06 07:57:41.799  2073  2097 I CHIP DIS: UDP:[fd3a:303:44bf:1:656a:6592:f4d0:86ea%wlan0]:5540: new best score: 6

So Matter SDK starts sending the Sigma1, but never gets a response. Nothing gets logged in the otbr-agent service, because I suspect it's going over wifi interface.

There are other interesting behavior that I see in the logs, like the [netif] Failed to transmit, error:Parse right after logs from CHIP-SDK minimal-mDNS logs, "as if" the CHIP SDK is trying to send multicast over all available interfaces (including ot0).

the wlan0 route table:

/ # ip -6 route show table wlan0
2601:647:5b80:59d0::/64 dev wlan0 proto kernel metric 256 expires 345597sec pref medium
2601:647:5b80:59d0::/64 dev wlan0 proto static metric 1024 pref medium
fd2e:a63:3281:1::/64 via fe80::2f3:61ff:fe3d:90f4 dev wlan0 proto ra metric 1024 expires 1308sec pref medium
fe80::/64 dev wlan0 proto kernel metric 256 pref medium
fe80::/64 dev wlan0 proto static metric 1024 pref medium
default via fe80::c294:35ff:fe58:2982 dev wlan0 proto ra metric 1024 expires 177sec hoplimit 64 pref medium

I tried lowering the priority of ot0, but the flow still fails in Sigma1 handshake.

wgtdkp commented 1 year ago

I tried lowering the priority of ot0, but the flow still fails in Sigma1 handshake.

What does

ip -6 route get fd3a:303:44bf:1:e4b4:66da:4205:cf50 

give now?

AlanLCollins commented 1 year ago

I can confirm the message is going over wifi, new attempt dest addr = fd3a:303:44bf:1:cf6d:c34b:8bc3:2f69 image

I tried lowest priority 10000 on the rules table. (now I got 3 entries) - but still fails.

/ # ip -6 rule | grep -E "wlan0|ot0"                                                                                                                                                                       
10000:  from all iif lo oif ot0 lookup 1011 
10500:  from all iif lo oif wlan0 uidrange 0-0 lookup wlan0 
13000:  from all fwmark 0x10069/0x1ffff iif lo lookup wlan0 
13000:  from all iif lo oif ot0 lookup 1011 
14000:  from all iif lo oif wlan0 lookup wlan0 
17000:  from all iif lo oif ot0 lookup 1011 
19000:  from all fwmark 0x69/0x1ffff iif lo lookup wlan0 
22000:  from all fwmark 0x0/0xffff iif lo lookup wlan0 

And these are the routes for last 3 attempts IP addr:

:/ # ip -6 route get fd3a:303:44bf:1:e4b4:66da:4205:cf50 
fd3a:303:44bf:1:e4b4:66da:4205:cf50 from :: via fe80::c294:35ff:fe58:2982 dev wlan0 table wlan0 proto ra src 2601:647:5b80:59d0:6c4e:4898:5e79:6906 metric 1024 hoplimit 64 pref medium
:/ # ip -6 route get fd3a:303:44bf:1:656a:6592:f4d0:86ea                                                                                                                                                    
fd3a:303:44bf:1:656a:6592:f4d0:86ea from :: via fe80::c294:35ff:fe58:2982 dev wlan0 table wlan0 proto ra src 2601:647:5b80:59d0:6c4e:4898:5e79:6906 metric 1024 hoplimit 64 pref medium
:/ # ip -6 route get fd3a:303:44bf:1:cf6d:c34b:8bc3:2f69                                                                                                                                                    
fd3a:303:44bf:1:cf6d:c34b:8bc3:2f69 from :: via fe80::c294:35ff:fe58:2982 dev wlan0 table wlan0 proto ra src 2601:647:5b80:59d0:6c4e:4898:5e79:6906 metric 1024 hoplimit 64 pref medium
wgtdkp commented 1 year ago

What about adding this?

ip -6 rule add from all fwmark 0x0/0xffff iif lo lookup 1011 priority 10000

or

ip -6 rule add from all iif lo lookup 1011 priority 10000
wgtdkp commented 1 year ago

ip -6 rule add from all iif lo oif ot0 lookup 1011 priority 13000 probably don't work if the native Matter socket is not bound to the net interface

AlanLCollins commented 1 year ago

@wgtdkp , Matter over Thread is working now on the Android host. I used the ip -6 rule add from all iif lo lookup 1011 priority 10300 command. Thank you very much for the awesome support !

Do you know if the problem is that openthread is not writing the routing policies, or that openthread is trying to write but does not have the correct permissions ?

wgtdkp commented 1 year ago

@wgtdkp , Matter over Thread is working now on the Android host. I used the ip -6 rule add from all iif lo lookup 1011 priority 10300 command. Thank you very much for the awesome support !

Great!

Do you know if the problem is that openthread is not writing the routing policies, or that openthread is trying to write but does not have the correct permissions ?

It may be not writing if you didn't configure specific sepolicy rules for openthread, and you should be seeing warning or error logs when adding routes. But even sepolicy rules are added, openthread is adding routes to the main table which is not used by Android (as you see there is no main table in output of ip -6 rule).

AlanLCollins commented 1 year ago

well, the routing policy is the only change that I need to make it work. From your response, I understood that routing policies are responsibility of the integrator. Similar to the makefile for building the solution. With this said, I need to find the most elegant way to inject those policies during my service start-up.

Looks that openthread is being successful in writing the routes in that 1011 table on its own. Can you share a pointer to the openthread code that modifies the route tables? I'd like to deep dive into the details.

Thank you again for jumping into this topic and guide me.!

jwhui commented 1 year ago

Looks that openthread is being successful in writing the routes in that 1011 table on its own. Can you share a pointer to the openthread code that modifies the route tables? I'd like to deep dive into the details.

https://github.com/openthread/openthread/blob/main/src/posix/platform/netif.cpp

wgtdkp commented 1 year ago

Looks that openthread is being successful in writing the routes in that 1011 table on its own.

Oh yes, you are right and I missed the routes in table 1011 . So all you need is ip -6 rule add from all iif lo lookup 1011 priority 10000, correct?

I noticed that the table ID for other networks is simply their network interface name, for example, wlan0 for the Wi-Fi network. Not clear why it's a number for ot0. Probably you can reboot and check if the table ID changes to ot0 or it stays as a constant 1011 - in both case, we can simply add a script when otbr starts up.

maxminard commented 1 year ago

@wgtdkp We've done some more experimenting around with the 1011 table id, and it looks like it doesn't change to ot0, nor is it constant. The value will increment if you re-establish a new network without rebooting. This poses a challenge to developing a boot script to implement the routing policy above. We can't seem to find where the 1011 table is established in the netif.cpp link above either, do you know where else that table creation might happen? Or if there's a way to statically assign this table id to ot0 prefixes? Thanks

wgtdkp commented 1 year ago

I think Android modifies the Linux kernel to always create a new routing table for a new network and all routes are added to that new table.

If you are just looking for building a demo

The table ID is constructed by 1000 + Interface Index, see RouteController.h (confirmed this is true for wlan0 on my Pixel 6).

If you are looking for a production solution, better raise an feature request will additional details of your use cases. The Thread team is working on porting Thread to the Android system.

maxminard commented 1 year ago

@wgtdkp Thank you for the follow up. Until Thread is fully ported to Android, we have developed a temporary solution:

I noticed that OMR routes used a netlink file descriptor (message type RTM_NEWROUTE) within the AddRoute function in netiff.cpp in order to add their prefixes to a routing policy rule. So in order for all of our routes to do the same, I

  1. Enabled OPENTHREAD_POSIX_CONFIG_INSTALL_OMR_ROUTES_ENABLE to give access to the AddRoute function
  2. Edited the AddRoute function to pass in a specific table id to the netlink fd
  3. Called the AddRoute function from UpdateUnicastLinux for all on mesh routes

Now we generate two routes for each on mesh address, 1) whichever one that the kernel assigns (id = 1000 + Interface index) and 2) whichever one we added it to with AddRoute (we used table id 100)

jacqueshsu commented 1 year ago

@AlanLCollins , May I know how to enable ot-br-posix in android? Your feedback is appreciated.

wgtdkp commented 1 year ago

Will leave it to @AlanLCollins,

But here gives a heads up of the official Android Thread support:

  1. You can find the ot-br-posix code for Android in https://cs.android.com/android/platform/superproject/main/+/main:external/ot-br-posix/
  2. You can find the new Thread HAL API and default implementation in https://cs.android.com/android/platform/superproject/main/+/main:hardware/interfaces/threadnetwork/
  3. Later there will be a ThreadNetworkService added to the Android framework to register the Thread network to the system and Android system API will be provided to control the Thread network. The code will be in https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/ThreadNetwork/

It currently doesn't run on Android, but you should be able to in the next months.

AlanLCollins commented 1 year ago

Hello @jacqueshsu , few items to consider:

jacqueshsu commented 1 year ago

@AlanLCollins , Thank you for the feedback. It's helpful to me. It seems it's not easy for the rookie of android. I will study your comments to do more researches, will back to you if any news from me.

Tristin-9527 commented 9 months ago

@AlanLCollins , Thank you for the feedback. It's helpful to me. It seems it's not easy for the rookie of android. I will study your comments to do more researches, will back to you if any news from me.

so, is there any progress?

jacqueshsu commented 9 months ago

Only the chip-tool is working to support matter over WiFi, and Ethernet, can’t over overthread. Then no time to continue it.

jwhui commented 9 months ago

Only the chip-tool is working to support matter over WiFi, and Ethernet, can’t over overthread.

@jacqueshsu , chip-tool definitely supports commissioning Thread devices.

https://github.com/project-chip/connectedhomeip/blob/master/docs/guides/chip_tool_guide.md

jacqueshsu commented 9 months ago

Yes, but it’s App. I rebuild it for android tool chain, also build dbus and bluez for android.