openthread / ot-br-posix

OpenThread Border Router, a Thread border router for POSIX-based platforms.
https://openthread.io/
BSD 3-Clause "New" or "Revised" License
420 stars 235 forks source link

Problems with setup script and OTBR setup guide #269

Closed llane12 closed 5 years ago

llane12 commented 5 years ago

I am trying to setup an OpenThread Border Router using a Raspberry Pi 3B with the latest version of Raspbian Stretch with desktop, 2019-04-08 and a Nordic nRF52840 connected via USB. I am following the guide on the OpenThread website (https://openthread.io/guides/border-router).

I have been able to get the Border Router running, Form a Thread network and join a second nRF52840 to the Thread network. The problem is I cannot get any communication between the Thread devices and a Windows computer connected to the Wi-Fi BorderRouter-AP.

I realise that I will need to do something with the prefix to enable communication with a private IPv4 network. The documentation on how to do this is terrible, but I basically can't get to the point where this is a problem.

I am seeing various problems using the setup script (./script/setup) and following the Wi-Fi Access Point Setup guide:

Following the guide:

I have stopped following the guide at this point as I have encountered so many problems I don't know how to continue.

I tried using the docker approach initially, but no Wi-Fi AP was created so I couldn't test communication between the Thread device and Windows computer. I have tried re-installing Raspbian from scratch several times now and keep running into these problems.

I have experience with software development on Windows but a lot of things in this area are new to me, so please don't assume any knowledge on my part. I apologise if I am missing something obvious.

bukepo commented 5 years ago

Hi @llane12 , are you using a desktop version of Raspberry PI image? If so, could you please the Raspbian Stretch Lite version? Unfortunately, we only tested on the lite version.

bukepo commented 5 years ago

To disable NETWORK_MANAGER, you may try the following command after applying #270 .

NETWORK_MANAGER=0 ./script/setup
llane12 commented 5 years ago

Hi @bukepo, thanks for the quick response. Initially, I was using the Lite version. Then when I switched to trying Docker I used the Desktop version. I missed that I should change back to Lite to follow the manual guide.

I will try again with a fresh install of Raspbian Lite and using your latest code.

If I want to follow the manual setup steps for the Wi-Fi Access Point, should I also set these flags to 0? NAT64, DNS64, DHCPV6_PD https://github.com/openthread/ot-br-posix/blob/master/examples/platforms/raspbian/default

bukepo commented 5 years ago

I don't think NAT64 will affect manually setup. And yes, you could try disabling DNS64 and DHCPV6_PD.

llane12 commented 5 years ago

I flashed an SD card with Raspbian Stretch Lite 2019-04-08 and booted the Raspberry Pi.

Checked out your modified code and ran the bootstrap script.

Ran the setup script like this: NETWORK_MANAGER=0 NAT64=0 DNS64=0 DHCPV6_PD=0 ./script/setup

After rebooting, I noticed that NetworkManager service was running. I am assuming that this should not be present if the NETWORK_MANAGER=0 flag is set

├─NetworkManager.service
             │ ├─412 /usr/sbin/NetworkManager --no-daemon
             │ └─503 /sbin/dhclient -d -q -sf /usr/lib/NetworkManager/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-69efb644-c21c-3589-b1ca-9fad3af7cdc3-eth0.lease -cf /var/lib/NetworkManager/dhclient

I had the same two problems as before following the Wi-Fi AP setup guide:

Step 1 dnsmasq failed to start

Apr 25 14:35:19 raspberrypi systemd[1]: Starting dnsmasq - A lightweight DHCP and caching DNS server...
Apr 25 14:35:19 raspberrypi dnsmasq[1037]: dnsmasq: syntax check OK.
Apr 25 14:35:20 raspberrypi dnsmasq[1040]: dnsmasq: failed to create listening socket for port 53: Address alr… in use
Apr 25 14:35:20 raspberrypi systemd[1]: dnsmasq.service: Control process exited, code=exited status=2
Apr 25 14:35:20 raspberrypi systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.
Apr 25 14:35:20 raspberrypi systemd[1]: dnsmasq.service: Unit entered failed state.
Apr 25 14:35:20 raspberrypi systemd[1]: dnsmasq.service: Failed with result 'exit-code'.

Step 3.4 Same problem modifying hostapd.service file I was getting the same message that it is not a normal file and couldn't save the changes. I found this:

lrwxrwxrwx 1 root root 9 Apr 25 14:35 hostapd.service -> /dev/null

So I deleted it with rm -f then created it as per the guide.

Carrying on, I didn't encounter any more errors. However, it was not successful.

I can see the BorderRouter-AP SSID and can connect to it, but can't connect to the router on 192.168.1.2.

dnsmasq service can't start

  UNIT            LOAD   ACTIVE SUB    DESCRIPTION
● dnsmasq.service loaded failed failed dnsmasq - A lightweight DHCP and caching DNS server

Recommended Troubleshooting

./script/server NETWORK_MANAGER=0
Current platform is raspbian
* Applying /etc/sysctl.d/60-otbr-ip-forward.conf ...
net.ipv6.conf.all.forwarding = 1
net.ipv4.ip_forward = 1
* Applying /etc/sysctl.d/98-rpi.conf ...
kernel.printk = 3 4 1 3
vm.min_free_kbytes = 16384
* Applying /etc/sysctl.d/99-sysctl.conf ...
net.ipv4.ip_forward = 1
* Applying /etc/sysctl.conf ...
net.ipv4.ip_forward = 1
Failed to start otbr-nat44.service: Unit otbr-nat44.service not found.
 *** ERROR:  Failed to start NAT44!

I am going to repeat the process 1 more time, I will let you know if the outcome is any different

jwhui commented 5 years ago

@llane12, I wanted to step back to understand what you are trying to achieve, so that we can determine the best/easiest path to get there. Are you simply trying to communicate between a Windows host on Wi-Fi and Thread devices? Do you require IPv4 connectivity on your private network?

llane12 commented 5 years ago

I ran through the setup again with a fresh install of Raspbian Stretch Lite. Basically all the same problems as before, but with the difference that at the end, running ./script/server NETWORK_MANAGER=0 didn't give any errors. Still, the Web GUI is not accessible over the Wi-Fi and there is no internet access.

Here are logs of the sessions: log1.txt log2.txt log3.txt

Hi @jwhui, all we want to do is route messages from Thread devices to an application hosted on a LAN. Ideally, we would use an Ethernet connection from the OTBR to the LAN, but seeing as the setup guide uses the Wi-Fi I wanted to at least get that working first.

jwhui commented 5 years ago

@llane12, if you just want to demonstrate connectivity, can you use the default setup process with Network Manager enabled? You can ignore the "Wi-Fi Access Point Setup" guide when Network Manager is enabled.

I just tried with Raspbian Stretch Lite and followed the standard setup flow:

$ ./script/bootstrap
$ ./script/setup

Everything works for me. I am able to add an on-mesh prefix and communicate with IPv6/4 hosts external to the Thread network.

llane12 commented 5 years ago

Demonstrating connectivity is the first step. After that we want to reconfigure the router to route between the Thread network and its Ethernet interface, rather than Wi-Fi. But, continuing with this test for now.

I have installed a fresh Raspbian Stretch Lite, run the default bootstrap and setup scripts and done the wpantund configuration to use the correct interface. I connected a tablet to the Wi-Fi BorderRouter-AP, accessed the Web GUI on 10.42.0.1 and formed a Thread network.

I can now ping between a separate Thread device and the device attached to the router using the IPv6 Link Local addresses. When forming the Thread network I set the On-Mesh Prefix to fd11:2345:6789:: which is intended to be a 48-bit prefix (https://en.wikipedia.org/wiki/Unique_local_address). The two Thread devices get an address with this prefix and can ping each other using those addresses.

But I cannot ping either Thread device from the tablet using their fd11 addresses, or ping the tablet from the Thread device (using its translated IPv4 address, 10.42.0.2 -> fd11:2345:6789:a2a:0:200:: but I may be calculating that incorrectly)

Thank you for your help

jwhui commented 5 years ago

@llane12, the default TAYGA config and firewall rules are probably getting in your way when trying to communicate between devices connected via the RPI's Thread and Wi-Fi networks. I just tested this on a fresh Raspbian Stretch Lite install using ./script/bootstrap && ./script/setup and worked well for me.

1) Add on-mesh prefix with default route to the Thread network. For example:

sudo wpanctl add-prefix --stable --preferred --slaac --default-route --on-mesh fd11:2233:0:1::`

2) Change NAT64 prefix to something other than the Well-Known Prefix (64:ff9b::/96). To do this, edit the /etc/tayga.conf file and search for the prefix configuration. You can use something like fd11:2233::/96.

3) Restart TAYGA

$ sudo service tayga restart

4) Remove iptables forwarding rule that rejects output via wlan0

$ sudo iptables -D FORWARD -o wlan0 -j REJECT --reject-with icmp-port-unreachable

5) From attached Thread device, issue ping to the NAT64-encoded IPv4 address. For example, for 10.42.0.2:

> ping fd11:2233::0a2a:0002

Hope that helps.

llane12 commented 5 years ago

@jwhui Thank you, that's very helpful. I can now ping Wi-Fi devices connected to the SoftAP from a Thread device.

How would I modify this configuration to be able to ping the Thread device from a computer connected to the Wi-Fi AP, is that possible?

I tried the following:

  1. Remove the on-mesh prefix
  2. Add an on-mesh prefix of fd11:2233:0:0:: with the same flags as suggested
  3. Change the NAT64 prefix in tayga.conf to fd11:2233::/64
  4. Restart tayga
  5. Renew the tablet's DHCP lease
  6. Try to ping the tablet from a Thread device using it's IPv4-embedded IPv6 address with 64-bit prefix: 10.42.0.2 -> fd11:2233::a:2a00:200:0
jwhui commented 5 years ago

Stateful NAT makes it difficult to ping from an external host to a Thread device. With dynamic NAT, the stateful mapping is configured when an internal host (Thread device) initiates communication to an external-host.

If you want to initiate communication from an external host, what you can do is setup UDP and/or TCP static port mappings.

For example:

  1. Add a map configuration to the end of /etc/tayga.conf to set up TAYGA IPv4 address to Thread IPv6 address. Something like:
    map 196.168.255.250 fd11:2233:0:a:d362:8d8a:4822:3ca0
  2. Restart TAYGA
    sudo service tayga restart
  3. Add a NAT44 static map to iptables
    sudo iptables -t nat -A PREROUTING -p udp -i wlan0 -d 10.42.0.1 --dport 1003 -j DNAT --to-destination 192.168.255.250:1234

With the above, any UDP message sent to 10.42.0.1:1003 on wlan0 will then be forwarded to fd11:2233:0:a:d362:8d8a:4822:3ca0:1234.

Hope that helps.

llane12 commented 5 years ago

@jwhui Thanks again for your help, I will try this out next week.

llane12 commented 5 years ago

@jwhui I am able to get communication from a computer connected to the WiFi AP to the Thread devices, which is enough for now to demonstrate that it is possible. Thank you for your help.