openNDS / mesh11sd

Mesh11sd is a dynamic parameter configuration daemon for 802.11s mesh networks.
GNU General Public License v2.0
35 stars 8 forks source link

Mesh nodes hijacking all IPs over time #88

Closed tgolsson closed 1 month ago

tgolsson commented 2 months ago

Hey!

I forgot to say thanks for the help with my previous issue, so thanks! I'm back with another one that might belong here or on the main OpenWRT forums, but I'm erring on a mesh11sd misconfiguration on my end still. Recently (with no changes at all) I've started having issues with my wired devices getting routed over the mesh. This seems to be caused by some form of misconfiguration of the mesh gateway. For example, right now this is my ARP table:

arp -a

Interface: 10.0.1.36 --- 0x19
  Internet Address      Physical Address      Type
  10.0.0.1              aa-ae-b0-8f-f0-7f     dynamic
  10.0.0.2              aa-ae-b0-8f-f0-7f     dynamic
  10.0.0.3              2a-db-e9-df-6b-27     dynamic
  10.0.0.27             2a-db-e9-df-6b-27     dynamic
  10.0.0.161            aa-ae-b0-8f-f0-7f     dynamic
  10.0.0.182            aa-ae-b0-8f-f0-7f     dynamic
  10.0.0.202            aa-ae-b0-8f-f0-7f     dynamic
  10.0.1.19             b4-e2-65-14-a0-ed     dynamic
  10.0.1.20             ec-b5-fa-2b-8e-69     dynamic
  10.0.1.29             2a-db-e9-df-6b-27     dynamic
  10.0.1.44             2a-db-e9-df-6b-27     dynamic
  10.0.255.255          ff-ff-ff-ff-ff-ff     static
  224.0.0.22            01-00-5e-00-00-16     static
  224.0.0.251           01-00-5e-00-00-fb     static
  239.255.255.235       01-00-5e-7f-ff-eb     static
  239.255.255.250       01-00-5e-7f-ff-fa     static

Note that both aa-ae-b0-8f-f0-7f and 2a-db-e9-df-6b-27 are mesh nodes. The only correct mappings here are 10.0.1.44 and 10.0.1.27... all the other ones are "hijacked". 10.0.0.1, my true router, should have MAC fc:ec:da:40:ff:96. It doesn't seem like a DNS/DHCP problem,but I'm honestly quite confused as to what is happening here.

The network isn't at all complex:

flowchart LR
subgraph infraroom
    drg[Fiber Converter/DRG]
    r[opnSense]
    s[L2 Switch]
    n0[Asus Lyra MAP-AC2200]
    desktop[Desktop Computer]
    laptop[Laptop]
    end
    subgraph upstairs
    n1[Asus Lyra MAP-AC2200]
    n2[Asus Lyra MAP-AC2200]
    end

    drg --> r
    r --> s
    s-->desktop
    s-->laptop

    s --> n0
    n0 -.- n1
    n0 -.- n2
    n1 -.- n2
    n2 -.- n1

And the two wired devices, Desktop and Laptop, are showing these duplicate entries. It seems to me like something is wrong with the mesh configuration for these devices to pretend to have all these IPs, not sure how though. The router isn't reachable through the mesh at all (from the perspective of the desktop computer), so I'm not sure why it'd ever claim to be responsible for it.

mesh11sd status from the gateway:

{
  "setup":{
    "version":"4.0.1",
    "enabled":"1",
    "procd_status":"running",
    "portal_detect":"1",
    "portal_detect_threshold":"0",
    "portal_channel":"3",
    "channel_tracking_checkinterval":"30",
    "mesh_basename":"m-11s-",
    "auto_config":"1",
    "auto_mesh_network":"lan",
    "auto_mesh_band":"2g40",
    "auto_mesh_id":"360c5420260e3e90c9c0f07aba7723",
    "mesh_gate_enable":"1",
    "mesh_leechmode_enable":"0",
    "mesh_gate_encryption":"3",
    "txpower":"20",
    "mesh_path_cost":"10",
    "mesh_path_stabilisation":"1",
    "checkinterval":"10",
    "interface_timeout":"10",
    "ssid_suffix_enable":"0",
    "debuglevel":"1"
  },
  "interfaces":{
    "m-11s-1":{
      "mesh_retry_timeout":"100",
      "mesh_confirm_timeout":"100",
      "mesh_holding_timeout":"100",
      "mesh_max_peer_links":"16",
      "mesh_max_retries":"3",
      "mesh_ttl":"31",
      "mesh_element_ttl":"31",
      "mesh_auto_open_plinks":"0",
      "mesh_hwmp_max_preq_retries":"4",
      "mesh_path_refresh_time":"1000",
      "mesh_min_discovery_timeout":"100",
      "mesh_hwmp_active_path_timeout":"5000",
      "mesh_hwmp_preq_min_interval":"10",
      "mesh_hwmp_net_diameter_traversal_time":"50",
      "mesh_hwmp_rootmode":"2",
      "mesh_hwmp_rann_interval":"5000",
      "mesh_gate_announcements":"1",
      "mesh_fwding":"1",
      "mesh_sync_offset_max_neighor":"50",
      "mesh_rssi_threshold":"-65",
      "mesh_hwmp_active_path_to_root_timeout":"6000",
      "mesh_hwmp_root_interval":"5000",
      "mesh_hwmp_confirmation_interval":"2000",
      "mesh_power_mode":"active",
      "mesh_awake_window":"10",
      "mesh_plink_timeout":"0",
      "mesh_connected_to_gate":"1",
      "mesh_nolearn":"0",
      "mesh_connected_to_as":"0",
      "mesh_id":"360c5420260e3e90c9c0f07aba7723",
      "device":"radio1",
      "channel":"1",
      "tx_packets":"56793705",
      "tx_bytes":"29654974937",
      "rx_packets":"68390434",
      "rx_bytes":"38663045589",
      "this_node":"2a:db:e9:df:6b:27",
      "active_peers":"3",
      "peers":{
        "e6:65:0b:6b:60:ae":{
          "next_hop":"e6:65:0b:6b:60:ae",
          "hop_count":"1",
          "path_change_count":"3",
          "metric":"28"
        },
        "aa:ae:b0:8f:f0:7f":{
          "next_hop":"00:00:00:00:00:00",
          "hop_count":"0",
          "path_change_count":"0",
          "metric":"0"
        },
        "e2:b3:2e:8a:81:4a":{
          "next_hop":"e2:b3:2e:8a:81:4a",
          "hop_count":"1",
          "path_change_count":"18",
          "metric":"28"
        }
      },
      "active_stations":"14",
      "stations":{
        "c8:89:f3:b2:1a:f6":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "4c:4f:ee:dc:79:96":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "e8:ca:c8:bf:09:bf":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "c0:e7:bf:fd:d4:31":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "38:01:46:1d:e3:a1":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "44:07:0b:8e:8c:1f":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "dc:fe:23:c2:43:fc":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "2c:4c:c6:61:33:80":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "ac:67:84:23:e0:4b":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "64:90:c1:11:ff:d7":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "ec:a9:07:04:ab:d1":{
          "proxy_node":"2a:db:e9:df:6b:27"
        },
        "fe:05:1c:a8:c1:5a":{
          "proxy_node":"e2:b3:2e:8a:81:4a"
        },
        "e0:51:d8:3f:07:3d":{
          "proxy_node":"e2:b3:2e:8a:81:4a"
        },
      }
    }
  }
}

Anything that sticks out as an obvious issue here?

bluewavenet commented 2 months ago

@tgolsson

I'm erring on a mesh11sd misconfiguration

It would be useful if you shared your config....... ;-)

What is the output of ip neigh ?

tgolsson commented 2 months ago

My bad;

# uci show mesh11sd
mesh11sd.setup=mesh11sd
mesh11sd.setup.auto_config='1'
mesh11sd.setup.auto_mesh_id='aaa'
mesh11sd.setup.auto_mesh_key='xxx'
mesh11sd.setup.portal_channel='3'
mesh11sd.setup.mesh_gate_encryption='3'
mesh11sd.setup.mesh_gate_key='xxx'
mesh11sd.setup.ssid_suffix_enable='0'
mesh11sd.mesh_params=mesh11sd
mesh11sd.mesh_params.mesh_fwding='1'
mesh11sd.mesh_params.mesh_rssi_threshold='-65'
mesh11sd.mesh_params.mesh_gate_announcements='1'
mesh11sd.mesh_params.mesh_hwmp_rootmode='2'
mesh11sd.mesh_params.mesh_hwmp_rann_interval='5000'
mesh11sd.mesh_params.mesh_hwmp_root_interval='5000'
mesh11sd.mesh_params.mesh_hwmp_active_path_timeout='5000'
mesh11sd.mesh_params.mesh_hwmp_active_path_to_root_timeout='6000'
mesh11sd.mesh_params.mesh_max_peer_links='16'
mesh11sd.mesh_params.mesh_connected_to_as='0'
mesh11sd.mesh_params.mesh_connected_to_gate='1'

ip neigh on the mesh gateway node:

# ip neigh
10.0.1.33 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.45 dev br-lan  used 0/0/0 probes 3 FAILED
10.0.1.6 dev br-lan lladdr c0:e7:bf:fd:d4:31 used 0/0/0 probes 1 STALE
10.0.1.30 dev br-lan lladdr fe:05:1c:a8:c1:5a ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.42 dev br-lan lladdr aa:ae:b0:8f:f0:7f used 0/0/0 probes 1 STALE
10.0.0.44 dev br-lan lladdr aa:ae:b0:8f:f0:7f used 0/0/0 probes 4 STALE
10.0.1.7 dev br-lan lladdr e8:ca:c8:bf:09:bf used 0/0/0 probes 1 STALE
10.0.1.19 dev br-lan lladdr b4:e2:65:14:a0:ed used 0/0/0 probes 1 STALE
10.0.1.31 dev br-lan lladdr 00:bf:af:36:77:4c ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.43 dev br-lan lladdr aa:ae:b0:8f:f0:7f used 0/0/0 probes 4 STALE
10.0.1.4 dev br-lan lladdr 64:90:c1:11:ff:d7 used 0/0/0 probes 1 STALE
10.0.1.16 dev br-lan lladdr e0:51:d8:3f:07:3d used 0/0/0 probes 1 STALE
10.0.0.6 dev br-lan lladdr 64:31:50:7b:60:9d used 0/0/0 probes 1 STALE
10.0.1.40 dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 1 STALE
169.254.103.75 dev br-lan lladdr d8:43:ae:09:ce:15 used 0/0/0 probes 0 STALE
10.0.255.175 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.5 dev br-lan lladdr 38:01:46:1d:e3:a1 used 0/0/0 probes 1 STALE
192.168.1.80 dev br-lan lladdr dc:fe:23:c2:43:fc used 0/0/0 probes 0 STALE
10.0.0.194 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.17 dev br-lan lladdr 68:ec:8a:05:4e:57 used 0/0/0 probes 1 STALE
10.0.0.7 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.0.206 dev br-lan lladdr 38:ea:a7:86:86:ea used 0/0/0 probes 0 STALE
10.0.1.41 dev br-lan  used 0/0/0 probes 3 FAILED
169.254.229.10 dev br-lan lladdr 30:05:05:dd:58:42 used 0/0/0 probes 0 STALE
10.0.1.2 dev br-lan lladdr 6c:02:e0:44:f5:4a used 0/0/0 probes 1 STALE
10.0.1.14 dev br-lan lladdr ac:ca:54:01:ea:eb used 0/0/0 probes 1 STALE
10.0.0.4 dev br-lan  used 0/0/0 probes 3 FAILED
10.0.1.26 dev br-lan  used 0/0/0 probes 3 FAILED
10.0.1.38 dev br-lan lladdr d8:43:ae:09:ce:15 used 0/0/0 probes 1 STALE
10.0.0.121 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.15 dev br-lan lladdr ac:67:84:23:e0:4b used 0/0/0 probes 1 STALE
10.0.0.5 dev br-lan lladdr 2c:41:38:17:79:ca used 0/0/0 probes 1 STALE
10.0.1.27 dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 4 STALE
10.0.1.39 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.0.29 dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 4 STALE
192.168.255.249 dev br-lan lladdr 44:07:0b:8e:8c:1f used 0/0/0 probes 0 STALE
10.0.1.12 dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 1 STALE
10.0.0.2 dev br-lan lladdr e6:65:0b:6b:60:ae ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.24 dev br-lan lladdr e2:b3:2e:8a:81:4a used 0/0/0 probes 6 STALE
10.0.1.36 dev br-lan lladdr 30:23:03:e2:48:2a ref 1 used 0/0/0 probes 1 REACHABLE
10.0.0.26 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 4 STALE
10.0.1.1 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.0.119 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.13 dev br-lan lladdr ce:16:b9:43:28:5b used 0/0/0 probes 1 STALE
10.0.0.3 dev br-lan lladdr e6:65:0b:6b:60:ae ref 1 used 0/0/0 probes 1 REACHABLE
10.0.0.202 dev br-lan  used 0/0/0 probes 3 FAILED
10.0.1.37 dev br-lan lladdr 30:05:05:dd:58:42 used 0/0/0 probes 1 STALE
10.0.0.27 dev br-lan  used 0/0/0 probes 3 FAILED
10.0.0.161 dev br-lan lladdr e6:65:0b:6b:60:ae ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.10 dev br-lan lladdr 44:07:0b:8e:8c:1f used 0/0/0 probes 1 STALE
10.0.0.146 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.34 dev br-lan lladdr 22:85:1c:b1:72:c1 used 0/0/0 probes 0 STALE
10.0.0.24 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.1.46 dev br-lan lladdr aa:ae:b0:8f:f0:7f used 0/0/0 probes 1 STALE
10.0.0.182 dev br-lan lladdr e6:65:0b:6b:60:ae ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.11 dev br-lan lladdr 8a:01:63:41:d0:8e used 0/0/0 probes 1 STALE
10.0.0.1 dev br-lan lladdr fc:ec:da:40:ff:96 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.23 dev br-lan lladdr c8:89:f3:b2:1a:f6 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.35 dev br-lan lladdr 12:e1:e0:28:37:b8 used 0/0/0 probes 1 STALE
10.0.1.47 dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 6 STALE
10.0.1.8 dev br-lan lladdr dc:fe:23:c2:43:fc used 0/0/0 probes 1 STALE
10.0.1.20 dev br-lan lladdr ec:b5:fa:2b:8e:69 used 0/0/0 probes 1 STALE
169.254.105.158 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 0 STALE
10.0.1.32 dev br-lan lladdr ec:a9:07:04:ab:d1 used 0/0/0 probes 1 STALE
10.0.1.44 dev br-lan lladdr e6:65:0b:6b:60:ae ref 1 used 0/0/0 probes 1 REACHABLE
169.254.209.18 dev br-lan lladdr ec:a9:07:04:ab:d1 used 0/0/0 probes 0 STALE
10.0.1.9 dev br-lan lladdr 4c:4f:ee:dc:79:96 ref 1 used 0/0/0 probes 1 REACHABLE
fe80::92ff:5653:f649:e9af dev br-lan lladdr 4c:4f:ee:dc:79:96 used 0/0/0 probes 0 STALE
fe80::5852:ab0c:7ed2:7963 dev br-lan lladdr ac:67:84:23:e0:4b router used 0/0/0 probes 0 STALE
fe80::809d:daff:fe31:4f77 dev br-lan lladdr 82:9d:da:31:4f:77 used 0/0/0 probes 0 STALE
fe80::e0b3:2eff:fe8a:814a dev br-lan lladdr e2:b3:2e:8a:81:4a used 0/0/0 probes 0 STALE
fe80::28db:e9ff:fedf:6b27 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 0 STALE
fe80::c9f:74da:440d:21a9 dev br-lan lladdr ec:a9:07:04:ab:d1 router used 0/0/0 probes 0 STALE
fe80::4607:bff:fe8e:8c1f dev br-lan lladdr 44:07:0b:8e:8c:1f used 0/0/0 probes 0 STALE
fe80::32:db6e:e6c5:6d8c dev br-lan lladdr fe:05:1c:a8:c1:5a used 0/0/0 probes 0 STALE
fe80::7879:a7e6:3f96:161d dev br-lan lladdr d8:43:ae:09:ce:15 used 0/0/0 probes 0 STALE
fe80::6af6:3ac5:c1ca:3efd dev br-lan lladdr 30:05:05:dd:58:42 used 0/0/0 probes 0 STALE
fe80::37aa:a4b1:8f8c:fa08 dev br-lan lladdr 00:bf:af:36:77:4c used 0/0/0 probes 0 STALE
fe80::2085:1cff:feb1:72c1 dev br-lan lladdr 22:85:1c:b1:72:c1 used 0/0/0 probes 0 STALE
fe80::1889:dcc0:1024:ed2b dev br-lan lladdr ce:16:b9:43:28:5b used 0/0/0 probes 0 STALE
fe80::4f5:c2cf:790e:31a9 dev br-lan lladdr f2:81:78:a4:94:76 used 0/0/0 probes 0 STALE
fe80::429:ca5d:9faf:f6c3 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 0 STALE
fe80::eeb5:faff:fe2b:8e69 dev br-lan lladdr ec:b5:fa:2b:8e:69 used 0/0/0 probes 0 STALE
fe80::e465:bff:fe6b:60ae dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 0 STALE
fe80::6a38:97af:fba0:6c39 dev br-lan lladdr 30:23:03:e2:48:2a used 0/0/0 probes 0 STALE
fe80::a8ae:b0ff:fe8f:f07f dev br-lan lladdr aa:ae:b0:8f:f0:7f used 0/0/0 probes 0 STALE
fe80::d8d3:6eff:fefa:9c6a dev br-lan lladdr da:d3:6e:fa:9c:6a used 0/0/0 probes 0 STALE
fe80::18c5:b732:fa2d:1db4 dev br-lan lladdr ce:16:b9:43:28:5b used 0/0/0 probes 0 STALE

And for good measure, the equivalent from the Windows side on the "misdirected" machine:

Get-NetNeighbor

ifIndex IPAddress                                          LinkLayerAddress      State       PolicyStore
------- ---------                                          ----------------      -----       -----------
64      ff12::8384                                         33-33-00-00-83-84     Permanent   ActiveStore
64      ff02::1:ffe1:8e6a                                  33-33-FF-E1-8E-6A     Permanent   ActiveStore
64      ff02::1:ffe1:8d63                                  33-33-FF-E1-8D-63     Permanent   ActiveStore
64      ff02::1:ffdc:8368                                  33-33-FF-DC-83-68     Permanent   ActiveStore
64      ff02::1:ffab:cd9a                                  33-33-FF-AB-CD-9A     Permanent   ActiveStore
64      ff02::1:ff7b:a2d9                                  33-33-FF-7B-A2-D9     Permanent   ActiveStore
64      ff02::1:ff20:72f5                                  33-33-FF-20-72-F5     Permanent   ActiveStore
64      ff02::1:3                                          33-33-00-01-00-03     Permanent   ActiveStore
64      ff02::1:2                                          33-33-00-01-00-02     Permanent   ActiveStore
64      ff02::fe                                           33-33-00-00-00-FE     Permanent   ActiveStore
64      ff02::fb                                           33-33-00-00-00-FB     Permanent   ActiveStore
64      ff02::16                                           33-33-00-00-00-16     Permanent   ActiveStore
64      ff02::c                                            33-33-00-00-00-0C     Permanent   ActiveStore
64      ff02::2                                            33-33-00-00-00-02     Permanent   ActiveStore
64      ff02::1                                            33-33-00-00-00-01     Permanent   ActiveStore
64      fe80::c321:fa8f:a87b:a2d9                          00-00-00-00-00-00     Unreachable ActiveStore
64      fe80::af3e:e053:e020:72f5                          00-00-00-00-00-00     Unreachable ActiveStore
64      fe80::7af1:ccaf:313f:f33a                          00-00-00-00-00-00     Unreachable ActiveStore
64      fe80::6a38:97af:fba0:6c39                          00-00-00-00-00-00     Unreachable ActiveStore
12      ff12::8384                                         33-33-00-00-83-84     Permanent   ActiveStore
12      ff02::1:ffdc:8368                                  33-33-FF-DC-83-68     Permanent   ActiveStore
12      ff02::1:ff7b:a2d9                                  33-33-FF-7B-A2-D9     Permanent   ActiveStore
12      ff02::1:ff20:72f5                                  33-33-FF-20-72-F5     Permanent   ActiveStore
12      ff02::1:3                                          33-33-00-01-00-03     Permanent   ActiveStore
12      ff02::1:2                                          33-33-00-01-00-02     Permanent   ActiveStore
12      ff02::fe                                           33-33-00-00-00-FE     Permanent   ActiveStore
12      ff02::fb                                           33-33-00-00-00-FB     Permanent   ActiveStore
12      ff02::16                                           33-33-00-00-00-16     Permanent   ActiveStore
12      ff02::2                                            33-33-00-00-00-02     Permanent   ActiveStore
12      fe80::c321:fa8f:a87b:a2d9                          00-00-00-00-00-00     Unreachable ActiveStore
12      fe80::af3e:e053:e020:72f5                          00-00-00-00-00-00     Unreachable ActiveStore
12      fe80::7af1:ccaf:313f:f33a                          00-00-00-00-00-00     Unreachable ActiveStore
12      fe80::6a38:97af:fba0:6c39                          00-00-00-00-00-00     Unreachable ActiveStore
1       ff12::8384                                                               Permanent   ActiveStore
1       ff02::1:ffe1:8e6a                                                        Permanent   ActiveStore
1       ff02::1:ffe1:8d63                                                        Permanent   ActiveStore
1       ff02::1:ffdc:8368                                                        Permanent   ActiveStore
1       ff02::1:ffc9:aded                                                        Permanent   ActiveStore
1       ff02::1:ffb0:23be                                                        Permanent   ActiveStore
1       ff02::1:ff7b:a2d9                                                        Permanent   ActiveStore
1       ff02::1:ff20:72f5                                                        Permanent   ActiveStore
1       ff02::1:3                                                                Permanent   ActiveStore
1       ff02::1:2                                                                Permanent   ActiveStore
1       ff02::fe                                                                 Permanent   ActiveStore
1       ff02::fb                                                                 Permanent   ActiveStore
1       ff02::16                                                                 Permanent   ActiveStore
1       ff02::c                                                                  Permanent   ActiveStore
1       ff02::2                                                                  Permanent   ActiveStore
25      239.255.255.250                                    01-00-5E-7F-FF-FA     Permanent   ActiveStore
25      239.255.255.235                                    01-00-5E-7F-FF-EB     Permanent   ActiveStore
25      224.0.0.252                                        01-00-5E-00-00-FC     Permanent   ActiveStore
25      224.0.0.251                                        01-00-5E-00-00-FB     Permanent   ActiveStore
25      224.0.0.22                                         01-00-5E-00-00-16     Permanent   ActiveStore
25      10.0.255.255                                       FF-FF-FF-FF-FF-FF     Permanent   ActiveStore
25      10.0.1.47                                          E6-65-0B-6B-60-AE     Stale       ActiveStore
25      10.0.1.46                                          AA-AE-B0-8F-F0-7F     Unreachable ActiveStore
25      10.0.1.45                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.1.44                                          2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.1.43                                          AA-AE-B0-8F-F0-7F     Unreachable ActiveStore
25      10.0.1.42                                          AA-AE-B0-8F-F0-7F     Unreachable ActiveStore
25      10.0.1.41                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.1.38                                          2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.1.31                                          00-BF-AF-36-77-4C     Reachable   ActiveStore
25      10.0.1.30                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.1.29                                          2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.1.27                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.1.26                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.1.24                                          00-00-00-00-00-00     Unreachable ActiveStore
25      10.0.1.20                                          EC-B5-FA-2B-8E-69     Reachable   ActiveStore
25      10.0.1.19                                          B4-E2-65-14-A0-ED     Reachable   ActiveStore
25      10.0.0.202                                         AA-AE-B0-8F-F0-7F     Stale       ActiveStore
25      10.0.0.182                                         2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.0.161                                         2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.0.44                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.0.29                                          E6-65-0B-6B-60-AE     Stale       ActiveStore
25      10.0.0.27                                          2A-DB-E9-DF-6B-27     Stale       ActiveStore
25      10.0.0.24                                          00-00-00-00-00-00     Unreachable ActiveStore
25      10.0.0.3                                           2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.0.2                                           2A-DB-E9-DF-6B-27     Reachable   ActiveStore
25      10.0.0.1                                           2A-DB-E9-DF-6B-27     Reachable   ActiveStore
64      239.255.255.250                                    01-00-5E-7F-FF-FA     Permanent   ActiveStore
64      239.255.255.235                                    01-00-5E-7F-FF-EB     Permanent   ActiveStore
64      224.0.0.251                                        01-00-5E-00-00-FB     Permanent   ActiveStore
64      224.0.0.22                                         01-00-5E-00-00-16     Permanent   ActiveStore
64      172.24.143.255                                     FF-FF-FF-FF-FF-FF     Permanent   ActiveStore
64      172.24.135.112                                     00-00-00-00-00-00     Unreachable ActiveStore
64      169.254.169.254                                    00-00-00-00-00-00     Unreachable ActiveStore
12      239.255.255.235                                    01-00-5E-7F-FF-EB     Permanent   ActiveStore
1       239.255.255.235                                                          Permanent   ActiveStore
1       224.0.0.22                                                               Permanent   ActiveStore
bluewavenet commented 2 months ago

@tgolsson ip neigh looks ok?

Your router: 10.0.0.1 dev br-lan lladdr fc:ec:da:40:ff:96 ref 1 used 0/0/0 probes 1 REACHABLE

What are the mac/ip addresses of the desktop and laptop?

I've started having issues with my wired devices getting routed over the mesh.

What do you mean by this - can you give some detail? Arp happens at layer 2 so should flood everywhere, if it didn't it would not work. It might be worth looking at ip neigh on the other two meshnodes.

tgolsson commented 2 months ago

The desktop is the one primarily having problem, though it's also the one I use the most. 10.0.1.36, 30-23-03-E2-48-2A. Laptop is off atm, but shows exactly same symptoms and all debugging I've done so far has shown identical results. 10.0.1.38 and d8:43:ae:09:ce:15 would be the last IP/port it was connected with.

What do you mean by this - can you give some detail? Arp happens at layer 2 so should flood everywhere, if it didn't it would not work. It might be worth looking at ip neigh on the other two meshnodes.

So it all started out with (~2 weeks ago) intermittent connection drops, and occasionally poor performance. Speedtests kept showing anywhere between 900/900 Mbps down to ~100/2, most of the time download being supposedly OK; and upload being <2 Mbps. I've slowly been isolating components, and at this point I know the following things:


ip neigh on satellite 1:

10.0.1.23 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 0 STALE
10.0.1.30 dev br-lan lladdr fe:05:1c:a8:c1:5a used 0/0/0 probes 0 STALE
fe80::32:db6e:e6c5:6d8c dev br-lan lladdr fe:05:1c:a8:c1:5a used 0/0/0 probes 0 STALE
fe80::37aa:a4b1:8f8c:fa08 dev br-lan lladdr 00:bf:af:36:77:4c used 0/0/0 probes 0 STALE
fe80::429:ca5d:9faf:f6c3 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 0 STALE
fe80::e465:bff:fe6b:60ae dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 0 STALE
fe80::a8ae:b0ff:fe8f:f07f dev br-lan lladdr aa:ae:b0:8f:f0:7f used 0/0/0 probes 0 STALE
fe80::c9f:74da:440d:21a9 dev br-lan lladdr ec:a9:07:04:ab:d1 router used 0/0/0 probes 3 STALE
fe80::28db:e9ff:fedf:6b27 dev br-lan lladdr 2a:db:e9:df:6b:27 router ref 1 used 0/0/0 probes 4 REACHABLE
fe80::92ff:5653:f649:e9af dev br-lan lladdr 4c:4f:ee:dc:79:96 used 0/0/0 probes 0 STALE
fe80::e0b3:2eff:fe8a:814a dev br-lan lladdr e2:b3:2e:8a:81:4a used 0/0/0 probes 0 STALE

ip neigh on satellite 2:

10.0.1.38 dev br-lan lladdr d8:43:ae:09:ce:15 used 0/0/0 probes 1 STALE
10.0.1.17 dev br-lan lladdr 68:ec:8a:05:4e:57 used 0/0/0 probes 4 STALE
10.0.1.36 dev br-lan lladdr 30:23:03:e2:48:2a used 0/0/0 probes 1 STALE
10.0.1.23 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 1 STALE
10.0.0.3 dev br-lan lladdr 2a:db:e9:df:6b:27 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.40 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 1 STALE
10.0.1.27 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 4 STALE
10.0.0.1 dev br-lan lladdr fc:ec:da:40:ff:96 used 0/0/0 probes 1 STALE
10.0.1.10 dev br-lan lladdr 44:07:0b:8e:8c:1f used 0/0/0 probes 1 STALE
10.0.1.44 dev br-lan lladdr 2a:db:e9:df:6b:27 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.31 dev br-lan lladdr 00:bf:af:36:77:4c ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.29 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 0 STALE
10.0.1.12 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 1 STALE
10.0.1.16 dev br-lan lladdr e0:51:d8:3f:07:3d used 0/0/0 probes 1 STALE
10.0.0.182 dev br-lan lladdr 2a:db:e9:df:6b:27 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.0.161 dev br-lan lladdr 2a:db:e9:df:6b:27 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.0.19 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 4 STALE
10.0.1.20 dev br-lan lladdr ec:b5:fa:2b:8e:69 used 0/0/0 probes 4 STALE
10.0.0.2 dev br-lan lladdr 2a:db:e9:df:6b:27 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.41 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 4 STALE
10.0.1.30 dev br-lan lladdr fe:05:1c:a8:c1:5a used 0/0/0 probes 1 STALE
10.0.1.9 dev br-lan lladdr 4c:4f:ee:dc:79:96 used 0/0/0 probes 1 STALE
10.0.255.175 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 4 STALE
10.0.1.15 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 1 STALE
10.0.1.32 dev br-lan lladdr ec:a9:07:04:ab:d1 used 0/0/0 probes 1 STALE
10.0.0.29 dev br-lan lladdr 2a:db:e9:df:6b:27 used 0/0/0 probes 4 STALE
10.0.1.19 dev br-lan lladdr b4:e2:65:14:a0:ed used 0/0/0 probes 1 STALE
fe80::37aa:a4b1:8f8c:fa08 dev br-lan lladdr 00:bf:af:36:77:4c used 0/0/0 probes 0 STALE
fe80::e465:bff:fe6b:60ae dev br-lan lladdr e6:65:0b:6b:60:ae used 0/0/0 probes 0 STALE
fe80::28db:e9ff:fedf:6b27 dev br-lan lladdr 2a:db:e9:df:6b:27 router ref 1 used 0/0/0 probes 5 REACHABLE
fe80::c9f:74da:440d:21a9 dev br-lan lladdr ec:a9:07:04:ab:d1 router used 0/0/0 probes 1 STALE
fe80::429:ca5d:9faf:f6c3 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 0 STALE
bluewavenet commented 2 months ago

@tgolsson There is nothing so far that would indicate anything was wrong, although clearly there is something amiss.

The desktop appears in the ip output for "satellite 2" and is correct: 10.0.1.36 dev br-lan lladdr 30:23:03:e2:48:2a used 0/0/0 probes 1 STALE

This will happen if something connected to satellite 2 accessed Desktop eg pinged it, so this is normal looking.

Show, from satellite 2, the outputs if ip neigh and mesh11sd status, running the commands, as close as possible to the same time.

I don't use windows at all for many years so can't remember exactly, but is the network status given by the ipconfig command? It would be interesting to see this on the Desktop.

So it all started out with (~2 weeks ago) intermittent connection drops, and occasionally poor performance.

So what happened 2 weeks ago? A Windows update?

tgolsson commented 2 months ago

After sleeping on it, let me have a thought and jump to a conclusion:

https://github.com/openNDS/mesh11sd/blob/0aa4bcdf09dfd45ac4e92279d1ce0d7bcfc3b4ba/src/mesh11sd#L1872-L1877

Could this be related? I'm not sure if I'd need it when running a flat network with a single subnet, and seems like it'd cause the symptoms I'm observing.

Edit: Decided to download Wireshark, and first ARP capture...:

1642    42.357824   BelkinIntern_e2:48:2a   Broadcast       ARP 42  Who has 10.0.0.1? Tell 10.0.1.36
1643    42.358454   Ubiquiti_40:ff:96   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at fc:ec:da:40:ff:96
1663    42.858894   2a:db:e9:df:6b:27   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at 2a:db:e9:df:6b:27
1687    43.133190   e6:65:0b:6b:60:ae   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at e6:65:0b:6b:60:ae

I think two of those are lying...

bluewavenet commented 2 months ago

@tgolsson

Could this be related?

Yes it could, but why did it only start to be a problem two weeks ago?

On the "infraroom" meshnode, try : echo 0 > /proc/sys/net/ipv4/conf/br-lan/proxy_arp

It is there to help layer 3 re-establish connections within the mesh backhaul if there is a mesh path change. It is not essential but speeds up the process dramatically.

tgolsson commented 2 months ago
sysctl -a  | grep ipv4.*proxy_arp\ =
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.br-lan.proxy_arp = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.lan.proxy_arp = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.m-11s-1.proxy_arp = 0
net.ipv4.conf.phy0-ap0.proxy_arp = 0
net.ipv4.conf.phy1-ap0.proxy_arp = 0
net.ipv4.conf.phy2-ap0.proxy_arp = 0
net.ipv4.conf.wan.proxy_arp = 0

After arp -d it still replies:

68471   1207.913006 BelkinIntern_e2:48:2a   Broadcast   ARP 42  Who has 10.0.0.1? Tell 10.0.1.36
68472   1207.913699 Ubiquiti_40:ff:96   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at fc:ec:da:40:ff:96
68480   1208.214657 e6:65:0b:6b:60:ae   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at e6:65:0b:6b:60:ae
68502   1208.770644 2a:db:e9:df:6b:27   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at 2a:db:e9:df:6b:27

BUT I noticed there's also the _pvlan and disabling proxy arp there solves it, though it looks I'll have to do it on all three nodes.

75742   1364.146219 BelkinIntern_e2:48:2a   Broadcast   ARP 42  Who has 10.0.0.1? Tell 10.0.1.36
75743   1364.146904 Ubiquiti_40:ff:96   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at fc:ec:da:40:ff:96
75756   1364.299935 e6:65:0b:6b:60:ae   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at e6:65:0b:6b:60:ae

The name indicates something related to vlans, but I've never had a VLAN configured anywhere.

bluewavenet commented 2 months ago

@tgolsson proxy_arp_pvlan is used to support vlan tunnelling on the mesh backhaul (eg gretap, vxlan).

Also note that in Linux, arp, route, iptunnel and nameif have been deprecated for some years, replaced by the ip utility. There is no guarantee that the arp command gives the correct result. It would be useful to see the output of ip neigh for the examples you gave above.

I have simulated your network with a router and 3 meshnodes, and a Linux laptop connecting in the same way as your Windows Desktop (I do not have a windows machine). I do not see the same problem at all.

I think this points to the problem being a Windows problem. Your Wireshark output is what I would expect to see, but the output of ip neigh correctly drops the more expensive "mesh" paths from the arp table. It looks quite likely that Windows (after whatever happened two weeks ago) uses the most expensive mac-route...

Can you confirm that if you leave proxy_arp enabled but disable proxy_arp_pvlan that your problem is solved?

If so, it would make sense to have proxy_arp_pvlan enabled as a config option in future (defaulting to disabled?).

tgolsson commented 2 months ago

I'm on baby duty for a few hours but will fill in more details later. Just a few notes while reading your response: my arp output is from windows, just like the Wireshark dumps. I assume the protocol itself should still work, or is that also superseded?

Fwiw I don't understand why the Wireshark output makes sense. The mesh node or mesh isn't on the path at all, so why is it claiming to reach it? Can you elaborate on why this makes sense to you? :)

(Also from some experimentation with arp -d and Wireshark arp just picks the last response as the Mac for the IP. I'll redo this with the PowerShell variant later.)

bluewavenet commented 2 months ago

@tgolsson

arp just picks the last response as the Mac for the IP

My OpenSuse Tumbleweed laptop picks the first.....

I'll do more testing this evening and elaborate a little too.

tgolsson commented 2 months ago

I missed your question about the timing, so to start with:It was the first time in a very long while that I disconnected or even fully shut down the PC. That's the only thing I can think of. I also removed a dumb L2 switch it was plugged into, (so I had L2 -> L2 -> Desktop). That dumb switch is now my main switch as I originally thought that was failing. So it didn't do anything magic unless the daisy chaining itself did something - which is unlikely. I'm more erring on pure luck on it working, or maybe I didn't notice the issues that much. I've been on parental leave for a while so my time at the computer is quite limited.

With that said, a few extra data points. I've unplugged my two satellites, and the issue can still be reproduced. So the issue - whatever it is - doesn't depend on multiple nodes interacting. This is after rebooting the gateway, so we're back to "default" mesh11sd settings. Unfortunately, also new random MAC, now ...:06.

This is the wireshark capture on my Windows machine:

12947   127.605116  BelkinIntern_e2:48:2a   Broadcast   ARP 42  Who has 10.0.0.1? Tell 10.0.1.36
12948   127.605719  Ubiquiti_40:ff:96   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at fc:ec:da:40:ff:96
12971   127.883248  16:83:ed:2b:13:06   BelkinIntern_e2:48:2a   ARP 60  10.0.0.1 is at 16:83:ed:2b:13:06

And this is the arp table:

arp -a

Interface: 10.0.1.36 --- 0x19
  Internet Address      Physical Address      Type
  10.0.0.1              16-83-ed-2b-13-06     dynamic

And Powershell agrees:

Get-NetNeighbor | Where-Object IPAddress -EQ 10.0.0.1

ifIndex IPAddress                                          LinkLayerAddress      State       PolicyStore
------- ---------                                          ----------------      -----       -----------
25      10.0.0.1                                           16-83-ED-2B-13-06     Reachable   ActiveStore

On the gateway itself;

# mesh11sd status
{
  "setup":{
    "version":"4.0.1",
    "enabled":"1",
    "procd_status":"running",
    "portal_detect":"1",
    "portal_detect_threshold":"0",
    "portal_channel":"3",
    "channel_tracking_checkinterval":"30",
    "mesh_basename":"m-11s-",
    "auto_config":"1",
    "auto_mesh_network":"lan",
    "auto_mesh_band":"2g40",
    "auto_mesh_id":"360c5420260e3e90c9c0f07aba7723",
    "mesh_gate_enable":"1",
    "mesh_leechmode_enable":"0",
    "mesh_gate_encryption":"3",
    "txpower":"20",
    "mesh_path_cost":"10",
    "mesh_path_stabilisation":"1",
    "checkinterval":"10",
    "interface_timeout":"10",
    "ssid_suffix_enable":"0",
    "debuglevel":"1"
  },
  "interfaces":{
    "m-11s-1":{
      "mesh_retry_timeout":"100",
      "mesh_confirm_timeout":"100",
      "mesh_holding_timeout":"100",
      "mesh_max_peer_links":"16",
      "mesh_max_retries":"3",
      "mesh_ttl":"31",
      "mesh_element_ttl":"31",
      "mesh_auto_open_plinks":"0",
      "mesh_hwmp_max_preq_retries":"4",
      "mesh_path_refresh_time":"1000",
      "mesh_min_discovery_timeout":"100",
      "mesh_hwmp_active_path_timeout":"5000",
      "mesh_hwmp_preq_min_interval":"10",
      "mesh_hwmp_net_diameter_traversal_time":"50",
      "mesh_hwmp_rootmode":"2",
      "mesh_hwmp_rann_interval":"5000",
      "mesh_gate_announcements":"1",
      "mesh_fwding":"1",
      "mesh_sync_offset_max_neighor":"50",
      "mesh_rssi_threshold":"-65",
      "mesh_hwmp_active_path_to_root_timeout":"6000",
      "mesh_hwmp_root_interval":"5000",
      "mesh_hwmp_confirmation_interval":"2000",
      "mesh_power_mode":"active",
      "mesh_awake_window":"10",
      "mesh_plink_timeout":"0",
      "mesh_connected_to_gate":"1",
      "mesh_nolearn":"0",
      "mesh_connected_to_as":"0",
      "mesh_id":"360c5420260e3e90c9c0f07aba7723",
      "device":"radio1",
      "channel":"1",
      "tx_packets":"4130",
      "tx_bytes":"1145519",
      "rx_packets":"0",
      "rx_bytes":"0",
      "this_node":"16:83:ed:2b:13:06",
      "active_peers":"0",
      "peers":{
      },
      "active_stations":"11",
      "stations":{
        "ec:a9:07:04:ab:d1":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "38:01:46:1d:e3:a1":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "c0:e7:bf:fd:d4:31":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "64:90:c1:11:ff:d7":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "e0:51:d8:3f:07:3d":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "44:07:0b:8e:8c:1f":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "ac:67:84:23:e0:4b":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "e8:ca:c8:bf:09:bf":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "c8:89:f3:b2:1a:f6":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "2c:4c:c6:61:33:80":{
          "proxy_node":"16:83:ed:2b:13:06"
        },
        "dc:fe:23:c2:43:fc":{
          "proxy_node":"16:83:ed:2b:13:06"
        }
      }
    }
  }
}
# ip neigh
10.0.1.40 dev br-lan  used 0/0/0 probes 6 FAILED
10.0.1.7 dev br-lan lladdr e8:ca:c8:bf:09:bf used 0/0/0 probes 0 STALE
10.0.1.8 dev br-lan lladdr dc:fe:23:c2:43:fc ref 1 used 0/0/0 probes 1 REACHABLE
10.0.0.1 dev br-lan lladdr fc:ec:da:40:ff:96 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.0.230 dev br-lan lladdr 2c:4c:c6:61:33:80 used 0/0/0 probes 1 STALE
10.0.1.15 dev br-lan lladdr ac:67:84:23:e0:4b used 0/0/0 probes 1 STALE
10.0.1.10 dev br-lan lladdr 44:07:0b:8e:8c:1f ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.30 dev br-lan  ref 1 used 0/0/0 probes 5 INCOMPLETE
10.0.1.16 dev br-lan lladdr e0:51:d8:3f:07:3d used 0/0/0 probes 1 STALE
10.0.0.3 dev br-lan  ref 1 used 0/0/0 probes 5 INCOMPLETE
10.0.0.182 dev br-lan  ref 1 used 0/0/0 probes 5 INCOMPLETE
10.0.1.36 dev br-lan lladdr 30:23:03:e2:48:2a ref 1 used 0/0/0 probes 1 DELAY
10.0.0.49 dev br-lan  used 0/0/0 probes 6 FAILED
10.0.0.161 dev br-lan  ref 1 used 0/0/0 probes 5 INCOMPLETE
10.0.1.23 dev br-lan lladdr c8:89:f3:b2:1a:f6 ref 1 used 0/0/0 probes 1 REACHABLE
10.0.1.29 dev br-lan  used 0/0/0 probes 6 FAILED
10.0.1.6 dev br-lan lladdr c0:e7:bf:fd:d4:31 used 0/0/0 probes 1 STALE
10.0.1.12 dev br-lan  used 0/0/0 probes 6 FAILED
10.0.1.32 dev br-lan lladdr ec:a9:07:04:ab:d1 used 0/0/0 probes 1 STALE
10.0.1.5 dev br-lan lladdr 38:01:46:1d:e3:a1 used 0/0/0 probes 0 STALE
10.0.1.20 dev br-lan lladdr ec:b5:fa:2b:8e:69 used 0/0/0 probes 4 STALE
10.0.1.19 dev br-lan lladdr b4:e2:65:14:a0:ed used 0/0/0 probes 1 STALE
10.0.0.2 dev br-lan  ref 1 used 0/0/0 probes 5 INCOMPLETE
fe80::80f0:6027:73af:570e dev br-lan lladdr b4:e2:65:14:a0:ed ref 1 used 0/0/0 probes 1 DELAY
fd42:7461:fcbd:0:993c:f58f:8b24:50f3 dev br-lan lladdr b4:e2:65:14:a0:ed ref 1 used 0/0/0 probes 1 REACHABLE
fe80::c9f:74da:440d:21a9 dev br-lan lladdr ec:a9:07:04:ab:d1 router ref 1 used 0/0/0 probes 1 REACHABLE
fe80::5852:ab0c:7ed2:7963 dev br-lan lladdr ac:67:84:23:e0:4b router used 0/0/0 probes 0 STALE
fe80::1483:edff:fe2e:1306 dev br-lan  used 0/0/0 probes 6 FAILED
fe80::1483:edff:fe2b:1306 dev br-lan lladdr 16:83:ed:2b:13:06 used 0/0/0 probes 0 STALE
fe80::429:ca5d:9faf:f6c3 dev br-lan lladdr c8:89:f3:b2:1a:f6 used 0/0/0 probes 0 STALE

I can confirm that disabling only pvlan solves the double ARP reply, so it's related to that option indeed.


I've searched a bit about proxy arp; and this sounds eerily similar: https://community.ui.com/questions/Duplicate-IPs-Ubiquiti-device-responds-to-EVERY-arp-request/b73ab060-047c-4285-98e3-379f085537c6

I've also found mentions of restricted vs unrestricted ARP, but it doesn't seem to be a universal distinction either. Also not quite sure how the VLAN part plays into this, or why it causes the issues.

bluewavenet commented 2 months ago

@tgolsson

The term "vlan" is a bit misleading in this context, it is more to do with "switching"....

From https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt


proxy_arp_pvlan - BOOLEAN
    Private VLAN proxy arp.
    Basically allow proxy arp replies back to the same interface
    (from which the ARP request/solicitation was received).

    This is done to support (ethernet) switch features, like RFC
    3069, where the individual ports are NOT allowed to
    communicate with each other, but they are allowed to talk to
    the upstream router.  As described in RFC 3069, it is possible
    to allow these hosts to communicate through the upstream
    router by proxy_arp'ing. Don't need to be used together with
    proxy_arp.

    This technology is known by different names:
      In RFC 3069 it is called VLAN Aggregation.
      Cisco and Allied Telesyn call it Private VLAN.
      Hewlett-Packard call it Source-Port filtering or port-isolation.
      Ericsson call it MAC-Forced Forwarding (RFC Draft).

The name "MAC-Forced Forwarding" is probably more appropriate for a mesh.

The mesh backhaul behaves somewhat like a virtual switch with each node being a switch port. To facilitate devices connected to mesh gates to be allowed to talk to each other, proxy_arp_pvlan does the business.

The problem arises when there is no mesh portal, instead there is a cabled link to a non mesh router and devices are connected on the same "cabled" link. (this is what you have).

The wireshark output makes sense because it is on your desktop connected outside the backhaul but bridged to it (on that cabled link). The proxy_arp_pvlan replies of the meshnodes are all bridged into this link.

Linux devices seem to be able to sort this out and use the non-aliased/proxied arp, but Windows it seems does not, at least after whatever changed two weeks ago.

I can confirm that disabling only pvlan solves the double ARP reply

This is the answer then. We make proxy_arp_pvlan a configurable option.

bluewavenet commented 2 months ago

@tgolsson You are using a different terminology that is confusing. To clarify:

Meshnode Types:

The mesh can have four types of meshnodes.

  1. Peer Node - the basic mesh peer - capable of mac-routing layer 2 packets in the mesh network.
  2. Gateway Node - a peer node that also hosts an access point (AP) radio for normal client devices to connect to. Also known as a gate. A gate can also function as a CPE (Customer [or Client] Premises Equipment), hosting a downstream layer 3 network with its own unique ipv4 subnet.
  3. Gateway Leech Node - a special type of Gateway Node that connects to the mesh backhaul but neither contributes to it nor advertises itself on it.
  4. Portal Node - a peer node that also hosts a layer 3 routed upstream connection (eg an Internet feed)

It is possible for a Portal node to also be a Gateway node (ie it hosts an AP as well as an upstream connection.

In your setup, all your meshnodes are Gateway nodes. One of them has a bridged and cabled, non-mesh connection to your upstream router.

Now, is the problem confined to the two cabled devices (Desktop and Laptop)? I cannot replicate the issue here, but then I do not have a Windows machine.

I should let you complete your tests ;-)

tgolsson commented 2 months ago

Sorry for my previous message - pvlan is the thing. I just had to also set it to 0 for the all group, if that was 1 and br-lan was 0 it was still ARPing. Seems to behave like proxy_arp, in the same document you linked:

proxy_arp - BOOLEAN
    Do proxy arp.
    proxy_arp for the interface will be enabled if at least one of
    conf/{all,interface}/proxy_arp is set to TRUE,
    it will be disabled otherwise

I'll delete the previous message, it's just wrong. :)

Re: terminology. Sorry - I'm trying to differentiate between the node connecting to my wired network, vs the two satellites. The first is what I've called the gateway, but I guess the distinction doesn't exist on this layer since I could connect something on the LAN ports on the satellites and they'd behave like the currently wired node.

My tests are done now that I've RTFM. :)

bluewavenet commented 2 months ago

@tgolsson

I'll delete the previous message, it's just wrong. :)

Best to leave it as it shows a genuine train of thought. It is all good!

bluewavenet commented 2 months ago

@tgolsson

I just had to also set it to 0 for the all group

Yes it's confusing. So if "all" is set to 1, it overrides any other settings? I thought it set the default...... OoH!

tgolsson commented 2 months ago

Too late. Also, to this question:

Now, is the problem confined to the two cabled devices (Desktop and Laptop)? I cannot replicate the issue here, but then I do not have a Windows machine.

Switching my laptop to WiFi and disconnecting the ethernet shows the same double ARP reply in Wireshark and the same incorrect MAC assignment to 10.0.0.1 (sometimes).

bluewavenet commented 2 months ago

@tgolsson

Switching my laptop to WiFi and disconnecting the ethernet shows the same double ARP reply in Wireshark and the same incorrect MAC assignment to 10.0.0.1 (sometimes).

Is this the wifi on a meshnode? I don't see that at all when connected to a meshnode.... Did you clear the arp table on the laptop .... or reboot it?

Why does this not effect all devices on your network?

bluewavenet commented 2 months ago

@tgolsson Is the "workaround" for the issue to disable proxy_arp_pvlan everywhere? Perhaps the way forward is to implement that config option. I am leaning towards having it on by default.

It would though, be nice to really understand what is happening with your setup.....

tgolsson commented 2 months ago

Is this the wifi on a meshnode? I don't see that at all when connected to a meshnode.... Did you clear the arp table on the laptop .... or reboot it?

Yes, the mesh nodes are my only WiFi. Laptop is next to the wired node, so probably that one. Clean boot without the cable, then for good measure arp -d to repro it a few times. I only have one of the nodes with "bad" settings now, two of them with pvlan off.

Why does this not effect all devices on your network?

Ok; so I think it does, with hindsight.

Part of my decent into madness (replacing all switches, cables, the router, ...) has been that removing things seems to fix it - for the moment. And it's been random - one day the baby cam is dying every second, next day good! TV's buffering or low res, restart it - magically 900/900 on Speedtest. So I've told my wife every day "it's fixed now" because I've unplugged or replugged something and speedtest got better for a bit, and then next day something is misbehaving. And so if my PC randomly decides to update a Steam game and ends up being routed over the "furthest" meshnode, it'll demolish the mesh bouncing all the traffic through there.

So one interesting thing I found with iperf3 was that performance wasn't symmetric: host A to B was very different to B to A - and sometimes the speeds would be equal up down, sometimes skewed to one side. Absolute madness. I guess if routes are messed up (and potentially being updated randomly) it'd also show up like that. In the end, the nodes might have different ideas about which MAC to send the traffic to. I wonder if the traffic could technically end up in a circle with very bad luck...

With that said, there is a question that you're raising: which devices are affected and how? I've tested Windows wired + Windows wireless, and both are affected. I have a bunch of laptops running various Linux distros and maybe something with a BSD, so if I find time over the weekend I'll sample them. And my wife has a Macbook, so would be interesting to see if that one can repro it.

My counter question for you at this point: in your test setup, do you not even get the multiple ARP replies? If you do, I'm wondering if we're just seeing Windows being bad at ARP collisions where Linux isn't. I'm betting most "embedded" devices are some form of Linux or BSD derivative if running an OS of some sort.

And on that final thought: the most noticeable culprit beyond my PC has been our babycam, which I'm imagining as a very dumb embedded device - it doesn't handle WiFi regions, has broken DHCP, and sucky networking in general. I'll have to look at that in Wireshark tomorrow and see if it interacts with ARP at all.

bluewavenet commented 2 months ago

@tgolsson I have gone back to the setup as close as possible to yours, but have OpenWrt on the isp router. My laptop connected to the switch between router and cabled meshnode.

  1. I get similar wireshark results
  2. ip neigh on my laptop does not show incorrect ip/mac for the router
  3. arp command no longer exists on OpenSuse Linux, so cannot try it.
  4. no issues with the connection

I do not want to jump to any conclusions....... I will look at any test results you can provide.

Significant differences between mine and yours:

  1. I have OpenWrt on my router
  2. I have a Linux laptop
bluewavenet commented 1 month ago

@tgolsson Commit https://github.com/openNDS/mesh11sd/commit/c763553eccdc47ef9953f0bebdf54841a5ccb0bf should resolve the problem.