openwrt / routing

OpenWrt Routing Packages
328 stars 369 forks source link

[batman-adv]: B.A.T.M.A.N does not transfer data until you restart the interfaces set to batadv and batadv_hardif. #1054

Open PussAzuki opened 5 months ago

PussAzuki commented 5 months ago

Maintainer: @simonwunderlich

Environment: mediatek/filogic, Xiaomi Redmi Router AX6000 (OpenWrt U-Boot layout), OpenWrt SNAPSHOT r25589-f84ed09d2c

Description: This my config below, I don't use VLANs, I also turned off dhcp for non-master nodes (by disabling dnsamsq, odhcpd)

For /etc/config/network

...
config device
    option name 'br-lan'
    option type 'bridge'
    list ports 'bat0'
    list ports 'lan2'
    list ports 'lan3'
    list ports 'lan4'
...
config interface 'lan'
    option device 'br-lan'
    option proto 'static'
    option ipaddr '192.168.x.2'
    option netmask '255.255.255.0'
    option ip6assign '60'
...
config interface 'bat0'
    option proto 'batadv'
    option routing_algo 'BATMAN_IV'
    option aggregated_ogms '1'
    option bridge_loop_avoidance '1'
    option gw_mode 'client'
    option hop_penalty '30'
    option network_coding '0'

config interface 'bat0_mesh0'
    option proto 'batadv_hardif'
    option master 'bat0'
    option mtu '2304'
...

For /etc/config/wireless

...
config wifi-iface 'mesh0'
    option device 'radio1'
    option mode 'mesh'
    option mesh_id 'xxxxxxxx'
    option encryption 'sae'
    option key 'xxxxxxxx'
    option mesh_fwding '0'
    option mesh_rssi_threshold '1'
    option network 'bat0_mesh0'
...

Now I see that my nodes will connect to mesh peers after booting, but the RX is always at a rate of 0, and at the same time I can't access the internet. I tested by batctl ping MAC on the master node and found Destination Host Unreachable. Then I set the gateway IP, manually connected to the non-master node, and after restarting bat0 and bat0_mesh0, the network is up!

I haven't used batman in a long time because the batman+dawn combo I had set up for myself at the time often made wireless unusable because of br-lan: received packet on bat0 with own address as source address

I thought batman by default was supposed to wait for mesh to get up before starting?

PussAzuki commented 5 months ago

Updates: I see these and similar lines in all of mesh nodes ONLY after I click restart button on bat0mesh0 interface.

Fri Apr  5 19:12:16 2024 daemon.notice wpa_supplicant[1590]: Set new config for phy phy1
Fri Apr  5 19:12:16 2024 daemon.notice hostapd: Set new config for phy phy1: /var/run/hostapd-phy1.conf
Fri Apr  5 19:12:16 2024 daemon.notice hostapd: Reloaded settings for phy phy1
Fri Apr  5 19:12:16 2024 daemon.notice wpa_supplicant[1590]: Set new config for phy phy1
Fri Apr  5 19:12:16 2024 daemon.notice netifd: Wireless device 'radio1' is now up
Fri Apr  5 19:12:16 2024 daemon.notice netifd: Interface 'bat0_mesh0' is enabled
Fri Apr  5 19:12:16 2024 daemon.notice netifd: Interface 'bat0_mesh0' has link connectivity
Fri Apr  5 19:12:16 2024 daemon.notice netifd: Interface 'bat0_mesh0' is setting up now
Fri Apr  5 19:12:16 2024 kern.info kernel: [   66.351627] batman_adv: bat0: Adding interface: phy1-mesh0
Fri Apr  5 19:12:16 2024 kern.info kernel: [   66.357415] batman_adv: bat0: Interface activated: phy1-mesh0

I think this should be useful information

ecsv commented 5 months ago

If you only see these lines when your click restart then it looks like you have a problem with your wireless interface.

Btw. your mesh_rssi_threshold looks bogus and should often prevent that other nodes can connect. This can cause problems because no link will be established - so batman-adv can also not communicate via your phy1-mesh0 interface. (according to the kernel, the only valid range is -255 to 0 - so this 1 shouldn't have any effect)

PussAzuki commented 5 months ago

option mesh_rssi_threshold '1' is my setup from luci, you can see what 1 means on luci.

They can connect to each other when only using 802.11s mode. I do not know if this do the wrong thing

ecsv commented 5 months ago

They can connect to each other when only using 802.11s mode.

But if you don't see the messages that phy1-mesh0 is added to bat0 (and is activated), it looks like netifd didn't recognize that phy1-mesh0 is a valid (device with carrier/linux) and therefore didn't add it to bat0. batman-adv doesn't have any control over that - because without the help of netifd, it doesn't have any knowledge about phy1-mesh0. You first need to investigate why netifd doesn't seem to add it to bat0

PussAzuki commented 5 months ago

So this question should be sent to the openwrt/openwrt?

I do not see phy1-mesh0 is add to bat0 when router boot up...I will check if some commit break it.