Why mesh_hwmp_rootmode="2" is used instead of 0?

biboc commented 1 month ago

Hi,

Why do you set all non-portal node to mesh_hwmp_rootmode="2" instead of 0? What are the advantages? For what I understand, any "simple" node should be a no-root node and only one with "service" (internet) should be PROACTIVE_RANN

Thanks,

/**
 * enum ieee80211_root_mode_identifier - root mesh STA mode identifier
 *
 * These attribute are used by dot11MeshHWMPRootMode to set root mesh STA mode
 *
 * @IEEE80211_ROOTMODE_NO_ROOT: the mesh STA is not a root mesh STA (default)
 * @IEEE80211_ROOTMODE_ROOT: the mesh STA is a root mesh STA if greater than
 *  this value
 * @IEEE80211_PROACTIVE_PREQ_NO_PREP: the mesh STA is a root mesh STA supports
 *  the proactive PREQ with proactive PREP subfield set to 0
 * @IEEE80211_PROACTIVE_PREQ_WITH_PREP: the mesh STA is a root mesh STA
 *  supports the proactive PREQ with proactive PREP subfield set to 1
 * @IEEE80211_PROACTIVE_RANN: the mesh STA is a root mesh STA supports
 *  the proactive RANN
 */
enum ieee80211_root_mode_identifier {
    IEEE80211_ROOTMODE_NO_ROOT = 0,
    IEEE80211_ROOTMODE_ROOT = 1,
    IEEE80211_PROACTIVE_PREQ_NO_PREP = 2,
    IEEE80211_PROACTIVE_PREQ_WITH_PREP = 3,
    IEEE80211_PROACTIVE_RANN = 4,
};

bluewavenet commented 1 month ago

@biboc You want all backhaul infrastructure peer nodes to advertise their presence and contribute to the HWMP. Such a node is a "root". Setting rootmode to 2 gives the best compromise between speed of path convergence and management traffic.

Compare with a leech node - there rootmode will be set to 0 along with mesh_fwding.

biboc commented 1 month ago

Ok, thanks,

If I understand this: https://elixir.bootlin.com/linux/v6.9/source/net/mac80211/mesh.c#L1749 All node (even root 0 node) sends a PREQ (https://elixir.bootlin.com/linux/v6.9/source/net/mac80211/mesh_hwmp.c#L1123) every HWMP_PREQ_MIN_INTERVAL https://elixir.bootlin.com/linux/v6.9/source/include/uapi/linux/nl80211.h#L4770

And with root mode 2/3, it also sends PREQ (https://elixir.bootlin.com/linux/v6.9/source/net/mac80211/mesh_hwmp.c#L1343) and no MPATH_RANN

So what is the difference?

plink_timeout is not used in mesh11sd. Which value do you recommend? Linux default value is 1800s https://elixir.bootlin.com/linux/v6.9/source/net/wireless/mesh.c#L26

biboc commented 1 month ago

And why do you set mesh_connected_to_as when it is a portal? What does "authentication server" means for the mesh?

Thanks,

bluewavenet commented 1 month ago

@biboc

What does "authentication server" means for the mesh?

A mesh portal is a "portal" to some upstream network, usually an Internet feed. A portal will advertise mesh_connected_to_as to the mesh backhaul to indicate that it is actually a portal. Control of backhaul access to the portal's upstream is controlled by the authentication service, whatever that may be. On the one hand the authentication service might be a captive portal. On the other hand it might be trivial "open access" link. Nevertheless it is still called "authentication service".

plink_timeout is not used in mesh11sd. Which value do you recommend?

It is fully supported by mesh11sd. From: https://openwrt.org/docs/guide-user/network/wifi/mesh/mesh11sd#list_of_available_parameters_and_their_function

you can see:

mesh_plink_timeout: If no tx activity is seen from a peered STA for longer than this time (in seconds), then remove it from the STA's list of peers. Default is 0, equating to 30 minutes

30 minutes seems to be a good general default value, but of course you can set whatever you need for a particular use case.

All node (even root 0 node) sends a PREQ .............................

So what is the difference?

There are 2 types of PREQ (Peer Request)

Reactive PREQ. A node receives a RANN (Root Announcement), so replies with a PREQ.
Proactive PREQ. A node proactively sends a PREQ in the hope of getting a peering from a root node.

biboc commented 1 month ago

Thanks for the detailed answer

By not used, I meant, no default value is set but yes we can set the param

mesh_plink_timeout: If no tx activity is seen from a peered STA for longer than this time (in seconds), then remove it from the STA's list of peers. Default is 0, equating to 30 minutes

However, linux doc says 0 for non removal, not 30 min: https://elixir.bootlin.com/linux/v6.9/source/include/uapi/linux/nl80211.h#L4825

I have a question about mesh_max_peer_links also Is it the max number of stations around I can connect to or the max number of node I find a path for? Let's say, I have 40 routers in my mesh, portal router can see only 10 other routers directly If I set mesh_max_peer_links to 20, will my portal find a path for the 39 other routers? Or it will stop at 20 If I set mesh_max_peer_links to 5, will my portal connect only to 5 stations around? Does it count only ESTAB ones? Or blocked also?

Thanks,

bluewavenet commented 1 month ago

@biboc

However, linux doc says 0 for non removal,

That may be so, but on OpenWrt iw returns a default of 0 and plinks time out in 30 minutes.... Also it works for any specified time as well. So maybe iw is getting it wrong.... Stranger things have happened ;-)

mesh_max_peer_links is the max number of peer links, not the max number of peers.

If I set mesh_max_peer_links to 20, will my portal find a path for the 39 other routers?

Yes

biboc commented 1 month ago

Ok thanks :)

Ok so it is only the stations around router can connect to

Last question: You set the portal router as DHCP server so IPs in the mesh use different subnet than upstream network Why? If upstream network has a DHCP server, why don't bridge interface and let upstream network deliver IPs to mesh? Would that generate latency?

Thanks again for your answers, it is now much clearer

bluewavenet commented 1 month ago

@biboc

You set the portal router as DHCP server so IPs in the mesh use different subnet than upstream network Why?

Well it is because it is a portal to the upstream network and the upstream network has a different ip4 subnet.

_Asking this question means you have missed the point of the portal_detect function of auto_configure._

Now if you want, for example, to connect to the lan of your isp router so that the mesh will be on the same subnet as the isp lan, then, if you are running in auto_config mode, all you need to do is connect an ethernet lan port on the meshnode to an ethernet lan port on the isp router (lan to lan).

The node will then configure itself as a peer node with dhcp disabled.

Compare this with the previous case where the node wan port was connected to isp router lan port (wan to lan), where it would auto-configure as a portal.

If you choose to run in manual mode (auto_config = 0) then you have to configure it yourself, however you like.

Would that generate latency?

I assume you mean does double nat increase latency? Well yes but the increase is undetectable for normal users. Perhaps an an extreme gamer might be able to measure it.....

biboc commented 1 month ago

@bluewavenet Thanks for the details

I see that you manage STP path cost in your daemon https://github.com/openNDS/mesh11sd/blob/master/src/mesh11sd#L3037

Can you explain how it works please?

bluewavenet commented 1 month ago

@biboc

STP path cost

It is required when there are non-mesh segments of the backhaul. Path cost is set for the mesh vif in the bridge to the value in the config. All other vifs in the bridge are set as 64K minus the config value. In this way, the default is "mesh is the cheapest path".

This will not block bridge loops though, that is done with nftables rules.

openNDS / mesh11sd

Why mesh_hwmp_rootmode="2" is used instead of 0? #61