Alternative mesh routing protocol options

sudomesh / disaster-radio

A (paused) work-in-progress long-range, low-bandwidth wireless disaster recovery mesh network powered by the sun.

https://disaster.radio

1.05k stars 109 forks source link

Alternative mesh routing protocol options #57

Open X-Ryl669 opened 4 years ago

X-Ryl669 commented 4 years ago

If I understand your wiki completely, you're only dealing with 2 level of abstraction for the routing table. As I understand it, it almost impossible to reach a node in a large network (let's say on the 120-th hop), because:

The origin node will never be aware of it
The limited routing table size can not grow exponentially.
The metric are not static, it can fluctuate depending on the moon or the position of the node or a car passing by and so on. Thus, sometimes packet will flow into the network correctly, sometimes they'll just vanish.

Typically, let's say you have this topology:

  [A] => [B] => [C] => [D] => [E] => [F]
                    \=> [E]=> [F]

If node A wants to talk to node F, then it must know about it (this is impossible to solve by storing all possible nodes's MAC, BTW). Let's say that it has learnt about it previously, and now wants to talk to it. It sends a packet with DEST=F and in its routing table, it needs to go via NEXTHOP = B.

Similarly, B needs to have it in its routing table also (how could that possible scale if the number of node is very high ?). Let's say B is Rainman and remembers about all possible nodes.

When B receives it, it'll have to make a choice between C and E for its nexthop. The metric for C might be better because C is closer to B, even if there is an additional node D in the chain. So it chooses C. The conditions are bad, and the packet struggles to follow the chain that goes via D to reach E (or F) immediately.

After some time, A would resend the packet to F, and by the time, the metric in for F would increase so that B select E. D finally wakes up and send the packet to E and finally F.

Then F can/will receive 2 packets (one coming from D=>E, another from E). How can it replies ?

Now imagine that F moves closer to B, so that B could speak to it directly. Who is going to remove the old routing table value in C and D ? What it a packet is being sent to C for F and D is currently removing its entry for F in its routing table before C removed its own version ? In that case, the packet will be lost at D (when C sends it to D for F) and no feedback is sent to A about this packet loss. This is going very bad on the global network consumption because, to deal with this kind of issues, A must resend the packet to ensure it'll arrive at its destination (wasting the global network resources) or any intermediary layer before F must answer to A somehow to tell it they've failed delivery (which could even be worse since a single packet could create a spread of packets back to the origin, if D tells A it failed, then C tells A it failed, and so on).

The issue described above is not a theory, it's the main routing issue we observe on internet. I'm not even speaking of bad route or packet loss here.

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption (image the above example if all nodes could see F except A).

We need some kind of path detection algorithm to figure out the best route to a destination.
We need some kind of logarithm abstraction somewhere to break the exponential grow (something like a tree or a forest algorithm) required by the current algorithm.
We need a way to avoid dual path sending (because it'll be almost impossible to keep the duty cycle to 1% if all paths starts sending the same packets twice)

Head2Wind commented 4 years ago

Would it be reasonable to port and or implementation the batman-adv to the ESP32 and leverage both or either of the wifi or Lora radios? https://openwrt.org/docs/guide-user/network/wifi/mesh/batman

On Thu, Feb 27, 2020, 10:07 AM X-Ryl669 notifications@github.com wrote:

If I understand your wiki completely, you're only dealing with 2 level of abstraction for the routing table. As I understand it, it almost impossible to reach a node in a large network (let's say on the 120-th hop), because:

The origin node will never be aware of it

The limited routing table size can not grow exponentially.

The metric are not static, it can fluctuate depending on the moon or the position of the node or a car passing by and so on. Thus, sometimes packet will flow into the network correctly, sometimes they'll just vanish.

Typically, let's say you have this topology:

[A] => [B] => [C] => [D] => [E] => [F] \=> [E]=> [F]

If node A wants to talk to node F, then it must know about it (this is impossible to solve by storing all possible nodes's MAC, BTW). Let's say that it has learnt about it previously, and now wants to talk to it. It sends a packet with DEST=F and in its routing table, it needs to go via NEXTHOP = B.

Similarly, B needs to have it in its routing table also (how could that possible scale if the number of node is very high ?). Let's say B is Rainman and remembers about all possible nodes.

When B receives it, it'll have to make a choice between C and E for its nexthop. The metric for C might be better because C is closer to B, even if there is an additional node D in the chain. So it chooses C. The conditions are bad, and the packet struggles to follow the chain that goes via D to reach E (or F) immediately.

After some time, A would resend the packet to F, and by the time, the metric in for F would increase so that B select E. D finally wakes up and send the packet to E and finally F.

Then F can/will receive 2 packets (one coming from D=>E, another from E). How can it replies ?

The issue described above is not a theory, it's the main routing issue we observe on internet. I'm not even speaking of bad route or packet loss here.

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption (image the above example if all nodes could see F except A).

We need some kind of path detection algorithm to figure out the best route to a destination.

We need some kind of logarithm abstraction somewhere to break the exponential grow (something like a tree or a forest algorithm) required by the current algorithm.

We need a way to avoid dual path sending (because it'll be almost impossible to keep the duty cycle to 1% if all paths starts sending the same packets twice)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sudomesh/disaster-radio/issues/57?email_source=notifications&email_token=AHM3AJD4KTX7U4BI26TOPALRE76M7A5CNFSM4K5AILM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQ37XUA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHM3AJE4D4VJUYKPFPGJDK3RE76M7ANCNFSM4K5AILMQ .

randomodbuild commented 4 years ago

Sorry I don't have much experience in this regard, but what about IPv6 on this? Maybe Yggdrasil (like cjdns). This maybe totally unrelated and not even possible, just throwing ideas around.

Juul commented 4 years ago

@Head2Wind @randomodbuild The maximum LoRa packet size is not large enough for even the smallest IPv6 packet. Currently babel uses IPv6. I'm not convinced that babel is appropriate for disaster.radio but it might be an interesting experiment if someone modified it to work with the smaller packet size.

@X-Ryl669

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption

Do you have an example of a mesh protocol that does not store an entry for each participant? I'm certain the current protocol can be greatly improved and I'm interested in any previous work in this area. My only experience is with WiFi mesh protocols such as babel, batman-adv and OLSR and they all store entries for every node in the mesh.

X-Ryl669 commented 4 years ago

Internet ? (with all its protocols, ICMP / Spanning Tree Protocol / BGP). The way it works is because it's organized logarithmically (typically, you only need to store the base path to one one and when you need to send a packet to a node, you'll send it to this node, it's up to them to route the packet correctly) For example, if Bob (123.1.2.3) need to send a packet to Alice (65.6.4.2), one just need to know that 65.255.255.255 is handled by Eve, and Eve only need to know that 65.6.255.255 is handled by John and so on.

mwarning commented 4 years ago

@X-Ryl669 the limitations you describe is the same basic limitation that babel, batman-adv, olsr and other mobile ad-hoc mesh routing protocols have. But it is much more severe as LoRa allows only very small amounts of data to be transmitted. DNA seems to transmit complete routing tables (up to 30 routing entries in one transmission) and data packets also serve a hello packets.

There are approaches that do not need to transfer all MAC addresses, but they have other drawbacks or have not left research (yet).

Yggdrasil uses an approach using a spanning tree (a name-dependent routing layer) and a DHT on top to map MAC addresses to locations on the spanning tree (aka name-independent routing). This has great scalability possibilities on the cost of only using paths on the spanning tree, but not all paths on a real mesh structure.

There are other approaches that have not been tested in the real world...

samuk commented 4 years ago

If we can assume that something like 50% of nodes has access to GPS, either on-board or via the Bluetooth app, could we do something with that?

The nodes just worry about accurate routing within +/- 0.5 degrees lat/long of their location, and aside from that they just send the packet closer to its grid location, towards nodes that will have a more accurate idea of where it is.

mwarning commented 4 years ago

@samuk what you propose is geographic routing. The general problem with this is that you will have local minima where packets get stuck (like in a one way street). The usual approach to "fix" this problem is to use a fallback-algorithm to get unstuck (e.g. face routing). I think this is not reliable and does not always avoid loops.

Anyway, you do not need GPS. Some papers suggest randomly selected "light tower" nodes to form a virtual coordinate system. But imho, you can create virtual coordinates even without (e.g. based on Vivaldi Network Coordinates).

samuk commented 4 years ago

Thanks, I'd probably heard the idea somewhere but didn't realise geographic routing was a well-established concept. Even if it has a routing void problem, maybe it's useful to explore geographic routing further as it does seem to solve the scaling problem? Perhaps there's something useful in this geographic/ fuzzy logic approach?

I also spotted this 2015 paper which suggests Cost to the Progress Ratio Routing might be worth a look.

@X-Ryl669 Our current understanding is that there is a 10% duty cycle in the EU, is this not correct?

emunicio commented 4 years ago

@Head2Wind @randomodbuild The maximum LoRa packet size is not large enough for even the smallest IPv6 packet. Currently babel uses IPv6. I'm not convinced that babel is appropriate for disaster.radio but it might be an interesting experiment if someone modified it to work with the smaller packet size.

@X-Ryl669

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption

Do you have an example of a mesh protocol that does not store an entry for each participant? I'm certain the current protocol can be greatly improved and I'm interested in any previous work in this area. My only experience is with WiFi mesh protocols such as babel, batman-adv and OLSR and they all store entries for every node in the mesh.

Hi, What about using 6LoWPAN Header Compression to reduce the packet size? And similarly, the RPL protocol for 6LoWPAN networks is virtual coordinate routing protocol (concretely a gradient-based routing protocol). I think it could solve the problem of having a next hop for every node in the network and could scale better.

Head2Wind commented 4 years ago

I wonder if perhaps we are trying to resolve a problem that is an outlying possibility. IMO IPv6 is overkill for what the primary use case is. IPv4 should be sufficient, unless I am missing something. Also, based upon this, the number of devices that need to be part of the Layer 2 mesh could be substantially reduced. I don't know the LoRa com protocol very well at all so cannot speak to a meaningful answer to my thoughts on the matter.

Personally, I like the possibility to of a long range L2 based 'wWAN' with LoRa, plus a near range L2 'wLAN' with a designated node for 'wWAN-route' with the WiFi (2.4ghz), then a BLE serial com link for app interactions. These are a lot of wants and its likely complex to make it this way.. Just thinking out loud.

On Mon, Mar 2, 2020 at 7:30 AM Esteban notifications@github.com wrote:

@Head2Wind https://github.com/Head2Wind @randomodbuild https://github.com/randomodbuild The maximum LoRa packet size is not large enough for even the smallest IPv6 packet. Currently babel uses IPv6. I'm not convinced that babel is appropriate for disaster.radio but it might be an interesting experiment if someone modified it to work with the smaller packet size.

@X-Ryl669 https://github.com/X-Ryl669

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption

Do you have an example of a mesh protocol that does not store an entry for each participant? I'm certain the current protocol can be greatly improved and I'm interested in any previous work in this area. My only experience is with WiFi mesh protocols such as babel, batman-adv and OLSR and they all store entries for every node in the mesh.

Hi, What about using 6LoWPAN Header Compression to reduce the packet size? And similarly, the RPL protocol for 6LoWPAN networks is virtual coordinate routing protocol (concretely a gradient-based routing protocol). I think it could solve the problem of having a next hop for every node in the network and could scale better.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sudomesh/disaster-radio/issues/57?email_source=notifications&email_token=AHM3AJBDTXH36NK7KKUYTQ3RFPGJNA5CNFSM4K5AILM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPXVBY#issuecomment-593459847, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHM3AJHNG77UG7Y4VUHWI5DRFPGJNANCNFSM4K5AILMQ .

-- ~~ Ken Nelson ~~ Voice: 360.389.ADVM (2386) “The mind is sharper and keener in seclusion and uninterrupted solitude. --- Be alone, that is the secret of invention; be alone, that is when ideas are born.” - Nikola Tesla ADVmachines™ Proven Solutions for your Worldwide Adventures www.advmachines.com ~ www.motolumen.com ~ www.motosherpa.com ~ www.xfile.ai http://www.advmachines.com www.advmod.com ~ www.eighteen650.com

mwarning commented 4 years ago

From what I read about 6LoWPAN, it looks like it does not support meshes with loops. It forms a tree and the root is one edge router/gateway. Maybe someone has a good source to research 6LoWPAN...

X-Ryl669 commented 4 years ago

Pros of 6LoWPAN:

IPV6 Header compression down to 2 bytes in a single neighbor node
Easy to use for connecting to internet (so we need to deal up to the first gateway that's connected to internet or any router)
Quite old and stable

Pros of RPL protocol:

Field tested
Seems reliable

Pros of LRP protocol for mesh routing:

Minimal traffic
Possible to search for a node (no need to store the complete network topology in the nodes)
No overload in any packet (works with existing data packets)
No routing loops
Low memory footprint

Cons of 6LoWPAN:

Implementation is a bit large and complex (but ESP32 is able to deal with it easily)
Only one of the layer of the network stack, still requires DNS/HTTP... (again available on ESP32 easily).
Typically made for simple UDP protocol on top.

Cons of RPL protocol:

Very very complex (159 pages + 4 RFC)
Requires large packets for multi layer mesh network.
Illogically though, information is always missing so need a lot of caching/memory in nodes
High network usage for routing packet exchange

Cons of LRP protocol:

Could create functional but non optimal routes

If you read french, this is a very good document comparing few mesh protocols. For english speakers, see here for presentation of protocol and a reference source code for ContikiOS

cyclomies commented 4 years ago

The team has done a nice job building the existing LL2 protocol.

What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates).

The header could look like this, name (byte): ttl (1), source_node (4), destination_node (4), sender_node (4), prev_node (4), next_node (4), seq_num (1)

If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables.

The header is always added, and updated while forwarding, to all packets sent. The nodes would then update their neighbors and routing tables, according to the information they receive from the headers they "hear" (not only from packets they are the recipient or intended forwarder).

As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics.

Reducing airtime by:

No control messages nor discovery packages
No beacons, alive, or hello messages
No routing tables sent
Only messages and ACK's sent
Only implicit ACK's on forwarding ("listenig")
Route info carried within packet headers

If all control packets are discarded, from the protocol, then how do we begin messaging if no routes exist? I do suggest flooding as a failower method:

--> Flood (broadcast) if no routes --> ACK when destination node is reached (stops flooding/routing) --> Send ACK to the sender --> Learn routes: update neighbors and 2-hop routing tables by listening --> Start routing per routing tables (unicast by hop&metric values) --> Flood if no route found (failower)

Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn.

Some flood optimizations to consider:

ACK immediately when msg rcw'd (stop reflooding)
Reflood weak signals "immediately" (wait for a possibe ACK)
Reflood intermediate signals if no weak's reflooded?
Reflood not strong signals (assume unnecessary)?
RSSI overlap on reflooding to minimize gaps
Transfer to routing if a fresh route exists within 2 hop range?

Some routing optimizations to consider:

ACK if message RCW'd (stop forwarding)
Failback to flood if no route
Implicit ACK's of forwards
Update routes from all heard packets
Reforward packet, if implicitely heard no ACK after a forward of a neighbor node (reinforcing TX)

Any comments?

paidforby commented 4 years ago

The team has done a nice job building the existing LL2 protocol.

Thanks!

What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates).

I've wanted to roll a portion of routing table maintenance into normal operation and I agree that doing so makes sense.

The header could look like this, name (byte): ttl (1), source_node (4), destination_node (4), sender_node (4), prev_node (4), next_node (4), seq_num (1)

To make sure I'm understanding this correctly, if you have the simple network,
A -> B -> C -> D -> F

Then a example header for a message sent A to F at the C to D hop would read,

ttl, address of A, address of F, address of C, address of B, address of D, seq_number

This isn't so different from the existing packet/datagram combined header, it just adds two addresses, the source node and the previous node. However, whose sequence number is in the packet? How does the receiving node calculate a metric for three hops with only one sequence number?

Another issue is that, like MAC headers, I would want packet headers to only be used node-to-node. After one hop the packet header is stripped off and rewritten for the next hop. Only the datagram stays the same over the course of the route. So I would rather expand the datagram, like so,

ttl (1), sender_node (4), next_node (4), seq_num (1), destination_node (4), source_node (4), source_seq_num (1)

Notice that I added the sequence number of the source node. This would allow the receiving node to calculate the metric for the source node as well as the sender node. I'm still not sure this would work unless each source-destination pair has their own sequence number, though this adds another layer of complication where you are actually calculating the metrics of routes instead of destinations.

The only way you could reasonably include the prev_node is if you also provided a sequence number from that node. Otherwise you know nothing more than that you have received some packets from that intermediate node.

In this design, every node would need to calculate metrics for every route on it's own, rather than nodes sharing metrics with one another. You also don't have a good way of knowing how many hops you are from the source node without adding a hop counter byte to the datagram.

My main question is, how does a node make routing decisions? If a node has multiple routes to the same destination, how does the node decide now which route to use? If it's only based on metrics. How do you calculate accurate metrics for all destinations with limited knowledge of the route to get to them?

If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables.

This gives me the idea that we could provide the ability to choose how large you want your address to be? Might be a cool option. So if you want a small, high-efficiency network, you could choose 2 byte address, but then you would have to manually set node addresses because there are only 254 usable addresses. Or if you are deploying a larger, more dynamic network, you could use 4 byte addresses.

As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics.

I'm hesitant about adding ACKs because of the limited duty cycle and introducing too much noise to networks.

No beacons, alive, or hello messages

I think we would still want occasional "i'm alive" messages sent when the network is quiet so that routes don't get stale if a node is not used for a while.

Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn.

The idea of routing by zones is interesting to me. Could be a good line of thought to investigate further. I have a hard time squaring it with the dynamic, ad-hoc design goal of LL2, but maybe I just need to think about it more deeply.

I like some of your ideas. It almost sounds like a source specific protocol. It may be worth simulating this to see how a larger network would deal with this alternative method of routing table management.

Any comments?

See above :)

cyclomies commented 4 years ago

@paidforby Interesting notes!

What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates).

I've wanted to roll a portion of routing table maintenance into normal operation and I agree that doing so makes sense.

The header could look like this, name (byte): ttl (1), source_node (4), destination_node (4), sender_node (4), prev_node (4), next_node (4), seq_num (1)

To make sure I'm understanding this correctly, if you have the simple network, A -> B -> C -> D -> F

Then a example header for a message sent A to F at the C to D hop would read,

ttl, address of A, address of F, address of C, address of B, address of D, seq_number

Correct.

This isn't so different from the existing packet/datagram combined header, it just adds two addresses, the source node and the previous node. However, whose sequence number is in the packet? How does the receiving node calculate a metric for three hops with only one sequence number?

My mistake - my seq_number refers to total hops from source_node, to current node. In your example, it would be 2 hops (B and C). Therefore all nodes hearing the packet, can count hops to neighbors, previous sender, and to the source. Whereas you refer, with sequence number, to a increasing packet resend number (identifying packet loss)?

We shall change, to avoid further mistakes: seq_num meaning times sent, and hop_count meaning total hop count from source?

Another issue is that, like MAC headers, I would want packet headers to only be used node-to-node. After one hop the packet header is stripped off and rewritten for the next hop. Only the datagram stays the same over the course of the route. So I would rather expand the datagram, like so,

The headers should always be recreated or updated, as you have to update the sender_node, prev_node, next_node, and hop counter from source. Where:

ttl = max hops (or something else)
source_node = packet orgin
destination_node = end target
sender node = current forwarder (or source)
prev_node = node before sender (or source)
next_node = intended forwarder (or destination)
hop_count = total hop count from source
seq_num = how many times sent (packet loss)

The hop count could be counted from the ttl, if the ttl was fixed. This would limit the options, and thereby I suggest we keep the ttl and hop count separate.

ttl (1), sender_node (4), next_node (4), seq_num (1), destination_node (4), source_node (4), source_seq_num (1)

Notice that I added the sequence number of the source node. This would allow the receiving node to calculate the metric for the source node as well as the sender node. I'm still not sure this would work unless each source-destination pair has their own sequence number, though this adds another layer of complication where you are actually calculating the metrics of routes instead of destinations.

I believe this would work for updating headers:

ttl (update by X-1)
source_node (no update)
destination_node (no update)
sender_node (update to source or current forwarder)
prev_node (update to source or the node before sender)
next_node (update: destination or the next forwarder),
hop_count (update by +1, to total hops from source)
seq_num (source updates: total retry number from source)

The only way you could reasonably include the prev_node is if you also provided a sequence number from that node. Otherwise you know nothing more than that you have received some packets from that intermediate node.

In this design, every node would need to calculate metrics for every route on it's own, rather than nodes sharing metrics with one another. You also don't have a good way of knowing how many hops you are from the source node without adding a hop counter byte to the datagram.

See above (better description of hop counter vs. seq number).

My main question is, how does a node make routing decisions? If a node has multiple routes to the same destination, how does the node decide now which route to use? If it's only based on metrics. How do you calculate accurate metrics for all destinations with limited knowledge of the route to get to them?

The node makes decisions by:

am I the designed forwarder/destination
did I hear a ACK from the destination
will ttl be >0
if I have route to destination, update header and forward to next forwarder/destination
if I can't hear the intended next forwarder to forward (implicit ack by listening), then flood (or use another route to destination, if many routes are maintained)
if no route, flood until destination send ACK or ttl is 0 (reset the ttl, if starting to flood)
if I am not the designated forwarder, but I can't hear the designated forwarder forward, although I can hear the forward request, and I have a fresh route to the source, then I do the forwarding (before the previous sender floods the packet)?

In this model, the route is defined by min hop count.

The routes are built upon information gathered from the headers: neighbours, 2 hop neighbours and hop count to different destinations. You get, as a forwarder/listener, routes, not only, for the sources (by parsing the header of the messages), and for the destinations (by parsing the header of the ACKs), but also routes to your neighbors and neighbors 2-hops away (to both directions). Therefore the routing tables are established quickly as messaging starts (boosted during flooding as the node can parse all headers it "hears").

As described, the ACK (from destination_node), keeps the user happy, and helps building routes. I think sequence metrics are not as important. This is due to the fact, that by this method, the source gets the ACK if message passes, and "listeners" can update their routes. Therefore the routes are kept fresh. Some randomness, as the delivery with the same hop count can be done in many ways, for routes can be actually good, to not to overload the only route?

If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables.

This gives me the idea that we could provide the ability to choose how large you want your address to be? Might be a cool option. So if you want a small, high-efficiency network, you could choose 2 byte address, but then you would have to manually set node addresses because there are only 254 usable addresses. Or if you are deploying a larger, more dynamic network, you could use 4 byte addresses.

This is a good idea. In my opinion, all the settings/values regarding the protocol should be changeable by the user. This way the protocol could be used within many different applications. Off course, "best" default values should be in place.

As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics.

I'm hesitant about adding ACKs because of the limited duty cycle and introducing too much noise to networks.

In this scenario, ACKs deliver great value, as they're carrying information of a) routing information to nodes (hops to destination, neighbours and 2-hop information), and b) users can confirm the message has been delivered.

There could be a setting to enable or disable automatic ACK from LL2?

No beacons, alive, or hello messages

I think we would still want occasional "i'm alive" messages sent when the network is quiet so that routes don't get stale if a node is not used for a while.

All messages are beacons. Therefore, a GPS update packet would keep the routes alive. If routes get stale, the message is flooded (to learn the routes again).

Could this route_delete_delay also be changeable by the user? As I tend to think there might be quite static networks, and even users wanting to have some of their nodes in a passive (listen only) mode, for saving battery.

There could also be a beacon_interval setting for time (and change of position)?

Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn.

The idea of routing by zones is interesting to me. Could be a good line of thought to investigate further. I have a hard time squaring it with the dynamic, ad-hoc design goal of LL2, but maybe I just need to think about it more deeply.

By zones, I tried to describe the differences formed within a real world wireless mesh: nearby nodes tend to hear lots of neighboring traffic (nodes within their range), and thereby build more accurate routing tables of their neighbors and their neighbors, whereas two nodes many hops away, tend to build good routing tables slower (as no messages might not been sent from one end of the mesh to another one). This causes a situation, especially in the beginning or during fast changes of the mesh, where nearby packets are routed, whereas packets to nodes far away are flooded (as no routes have been established yet). This behavior is ok and expected.

paidforby commented 4 years ago

Very interesting. Thanks for clarifying many of your points. It actually sounds very similar to a dynamic source routing protocol, which was also suggested by @geeksville here https://github.com/sudomesh/LoRaLayer2/issues/10#issuecomment-614936972 and here https://github.com/sudomesh/disaster-radio/issues/52#issuecomment-614938017. I'm going to think more about this DSR idea, i think it may be a good one. Also, this would give me a reason to revisit the simulator and sync it up with all the changes to LL2. I'll keep y'all posted if I make any progress toward a DSR fork.

cyclomies commented 4 years ago

Very interesting. Thanks for clarifying many of your points. It actually sounds very similar to a dynamic source routing protocol, which was also suggested by @geeksville here sudomesh/LoRaLayer2#10 (comment) and here #52 (comment). I'm going to think more about this DSR idea, i think it may be a good one. Also, this would give me a reason to revisit the simulator and sync it up with all the changes to LL2. I'll keep y'all posted if I make any progress toward a DSR fork.

You might find these findings interesting (broadcasting, routing, zero-control-packet, and reactive & pro-active protocols in mesh networking):

Mesh networking (2018): https://inthemesh.com/archive/whitepaper-connectivity-of-mesh-networks/
ECHO (2018): https://cdn.shopify.com/s/files/1/0046/7524/0005/files/echo.pdf?21683
VINE (2019): https://cdn.shopify.com/s/files/1/0046/7524/0005/files/goTenna_VINE_MILCOM_2019.pdf?15166
Aspen Groove (2019): https://cdn.shopify.com/s/files/1/0046/7524/0005/files/goTenna_Long-Range_Short-Burst_SECON.pdf?8213 and https://inthemesh.com/archive/long-range-short-burst-mobile-mesh-networking/
Percolation theory: https://inthemesh.com/archive/from-mocha-to-mesh-insights-from-percolation-theory/

Something similar could be done within LL2?

broadcast-->learn-->route-->broadcast (if no route)-->learn more..

Why only links refering to GoTenna? Because there have not so far been any other well known options..

paidforby commented 4 years ago

Cool stuff, too bad their code isn't open source.

I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR

My thinking, and what I believe @cyclomies is suggesting, is to compress the routing table into the LL2 packet header.

The new packet struct looks like this,

struct Packet {
    uint8_t ttl;
    uint8_t totalLength;
    uint8_t sender[ADDR_LENGTH];
    uint8_t receiver[ADDR_LENGTH];
    uint8_t sequence; // message count of packets tx'd by sender
    uint8_t source[ADDR_LENGTH]; // original sender of datagram
    uint8_t hopCount; // start 0, incremented with each retransmit
    uint8_t metric; // of source-sender link
    Datagram datagram;
};

The datagram struct remains the same, but is now defined in the LL2 code,

struct Datagram {
    uint8_t destination[ADDR_LENGTH];
    uint8_t type;
    uint8_t message[MESSAGE_LENGTH];
};

To initialize the routing process, nodes must begin by transmitting packets with the broadcast address as the destination and receiver.

Upon receiving a broadcast packet, neighbors inspect the header to see who was the sender, they then add the sender address to their routing table with a hop distance of 1 and a metric based on the sequence number.

Once nodes have neighbors, they can begin transmitting packets with a neighbor address as both destination and receiver.

Neighbor nodes then listen to all packets that are transmitted. Upon receiving they inspect the packet for three items,

sender address + sequence number - to update neighbor table and neighbor route metric
receiver address - to add as a possible route, two hops away via the sender address with unknown metric
source address + hopCount + metric - add as known route via the sender address, distance or hopCount+1 away, with a metric that is the weighted average of the metric for the source-sender link (included in packet) and the metric for sender-receiver link (calculated previously from sequence number).

Then after sending packets to all neighbors, a node can begin attempting to sending packets with a two hop destination via the neighbor. This will then allow multi-hop routes two start developing metrics based on the metrics of each hop.

The internal routing table management is almost exactly the same. which means that it is still very compatible with the original table sharing protocol. Really the only difference is that instead of sending routing tables in a single, large packet, routing table entries are sent one at a time inside of the header of every packet.

This should work well. Though, I will still provide the option to enable control messages that share the entire routing table, such that a network could converge faster at the expense of airtime and increased noise.

My next step is to port LL2 back to the simulator to test the validity of this protocol. And to write up full documentation of this modification on the wiki.

cyclomies commented 4 years ago

Cool stuff, too bad their code isn't open source.

Totally agree, and empathizes the value of this project!

I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR

My thinking, and what I believe @cyclomies is suggesting, is to compress the routing table into the LL2 packet header.

Correct. Sorry for my unclear text.

The new packet struct looks like this,
struct Packet {
    uint8_t ttl;
    uint8_t totalLength;
    uint8_t sender[ADDR_LENGTH];
    uint8_t receiver[ADDR_LENGTH];
    uint8_t sequence; // message count of packets tx'd by sender
    uint8_t source[ADDR_LENGTH]; // original sender of datagram
    uint8_t hopCount; // start 0, incremented with each retransmit
    uint8_t metric; // of source-sender link
    Datagram datagram;
};
I would suggest to add previous_sender (one sender before the last sender). This way a node could, not only get the last sender, but also the sender before the last sender. As such, the information of the header would let the node build a routing table (and hop counts..) of its neighbors, its neighbors neighbors, and routes to sources.

The datagram struct remains the same, but is now defined in the LL2 code,
struct Datagram {
    uint8_t destination[ADDR_LENGTH];
    uint8_t type;
    uint8_t message[MESSAGE_LENGTH];
};
To initialize the routing process, nodes must begin by transmitting packets with the broadcast address as the destination and receiver.

Destination address could be the node we are looking for?

Upon receiving a broadcast packet, neighbors inspect the header to see who was the sender, they then add the sender address to their routing table with a hop distance of 1 and a metric based on the sequence number.

..and if we had previous_sender (sender before the last sender), we could add 2 hop destinations to the table.

Once nodes have neighbors, they can begin transmitting packets with a neighbor address as both destination and receiver.

..and to nodes two hops away, if previous_sender would be added in the header, and node databases would be updated corresponding.

Neighbor nodes then listen to all packets that are transmitted. Upon receiving they inspect the packet for three items,

I suggest the nodes would use all packets they overhear for building the node database, but process only messages they are the intended sender or destination.

sender address + sequence number - to update neighbor table and neighbor route metric

receiver address - to add as a possible route, two hops away via the sender address with unknown metric

source address + hopCount + metric - add as known route via the sender address, distance or hopCount+1 away, with a metric that is the weighted average of the metric for the source-sender link (included in packet) and the metric for sender-receiver link (calculated previously from sequence number).

Correct. And if adding the previous_sender, we would learn the routes within our neighborhood (1-2 hop away) fast, as we overhear nodes transmitting around us.

Then after sending packets to all neighbors, a node can begin attempting to sending packets with a two hop destination via the neighbor. This will then allow multi-hop routes two start developing metrics based on the metrics of each hop.

And allow the node to route packets to nodes, that it has been heard as sources (as it knows the starting path and the hop count). If, sorry for repeating, we had previous_sender information received, we could route already to nodes three hops away. This would expand the the coverage for routing a lot, without a too large header.

The internal routing table management is almost exactly the same. which means that it is still very compatible with the original table sharing protocol. Really the only difference is that instead of sending routing tables in a single, large packet, routing table entries are sent one at a time inside of the header of every packet.

This should work well. Though, I will still provide the option to enable control messages that share the entire routing table, such that a network could converge faster at the expense of airtime and increased noise.

This would be a good option, especially used on static nodes and when the network has been quiet (otherwise, we might start to be unsure of our old routes?).

I do believe this could be left unused, if the intended destination nodes would send ACKs of recieved packets. This would, not only let the sender know the packet was received, but also update node databases for packets coming from the direction of the last destination (=learn routes with fewer hops to the destinations).

Furthermore, previous_node information could be used to determine shorter routes to nodes: if we overhear a packet sent from previous_node and hear the packet of the sender, we could correct our route for the source node to go straight through previous_node.

By adding this, we could always, when overhearing a packet, update our routing table for source, sender, and previous_sender. If ACKs where integrated, we would also learn the routes towards the destinations (as we would soon hear the ACK).

In a small 6-10 node max 4 hop network, we could learn the whole mesh maybe with 1-2 broadcast messages and 1-3 unicast messages (including the ACKs), as all packets reveals routes to three nodes.

My next step is to port LL2 back to the simulator to test the validity of this protocol. And to write up full documentation of this modification on the wiki.

Good luck!

paidforby commented 4 years ago

I think my issue with adding a previous_sender address is that it is another address of uncertain quality (the receiver address also being uncertain until you actually hear from it) and does not contain useful information until you have a five node (four hop) wide network. In a four node (three hop) wide network, the previous_sender would always be either empty or the source. Could be a useful option still. I will look into testing it in the simulator.

I suggest the nodes would use all packets they overhear for building the node database, but process only messages they are the intended sender or destination.

This is currently how I've written the protocol. It always parses the header for routes, but it only retransmits the packet if the receiver address matches the nodes local address and it only passes the packet to Layer3 if it if the destination matches your nodes local address or the broadcast address. See the receive() function through which all incoming packets are processed.

By adding this, we could always, when overhearing a packet, update our routing table for source, sender, and previous_sender. If ACKs where integrated, we would also learn the routes towards the destinations (as we would soon hear the ACK).

Good point, I understand what you are suggesting. Right now we are working under the idea of manual ACKing by the user of a node. Maybe could make automatic ACKs with an optional feature. I definitely see the value of ACKing with this style of protocol.

I will play around with these ideas in the simulator soon.

paidforby commented 4 years ago

Also if anyone is interested in playing with the this fork of the protocol, I've synced up the 1.0.0-rc.2 branch with the LL2 DSR branch.

cyclomies commented 4 years ago

I think my issue with adding a previous_sender address is that it is another address of uncertain quality (the receiver address also being uncertain until you actually hear from it) and does not contain useful information until you have a five node (four hop) wide network. In a four node (three hop) wide network, the previous_sender would always be either empty or the source. Could be a useful option still. I will look into testing it in the simulator.

You're right on point.

Previous_sender should help with larger mesh networks. There is, according to my understanding, one more benefit of that feature: if, and when, node A overhears several nodes (B-D) routing packets to B from source Z, A can recognize a shorter route to E trough D (previous_sender for packet forwarded by D), even if E was not the source. previous_sender should not left empty, but marked as source (hop counting logic for finding shortest route should be easier to do?).

This could not only shorten hop count of one node to another one (A-->E), but also optimize the whole "zone of overhead transmits".

I will play around with these ideas in the simulator soon.

I'm interested to hear your verdict of those simulations on both prev_sender and ACKs. Option for manual and automated ACKs sounds great (automated for one to one messaging, and reducing traffic by manual ACK for one to all room messages?).

A video/animation of the simulations could be fun to watch.

samuk commented 4 years ago

Cool stuff, too bad their code isn't open source.

I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR

@geeksville did you have a chance to check out the DSR fork? Might it meet your needs?

cyclomies commented 4 years ago

@paidforby, did you find time for those simulations?

paidforby commented 4 years ago

Not yet, I became incredibly distracted with updating the the simulator to make it more usable (namely introducing another websocket server at Layer 3 so I can send messages using the real web app). I should be able to test in another day or two.

geeksville commented 4 years ago

@samuk Alas I have not, I'll definitely check it out in the next couple of weeks. I still might end up doing my own DSR, based on cribbing from radiohead's version and RFC 4728 (because it is super mature and well studied). The radiohead guy seems to have just taken that RFC and only implemented the fourish minimum requirements. I'd love to share an implementation with ya'll but I also kinda love protocol buffers for this sort of stuff. If curious my current work queue is here.

paidforby commented 4 years ago

@cyclomies I have made significant progress in updating the protocol as we discussed. You can read a description of how routes are not built on the wiki, https://github.com/sudomesh/disaster-radio/wiki/Protocol#building-routes. It seems to be working well enough in the simulator, https://github.com/sudomesh/disaster-radio-simulator, which is also much more usable with the addition of serial clients for every simulated node that can be accessed using socat.

I still have some further generalization I need to make to the protocol. And I have yet to bring back proactive routing table packets as an option, but that should be easy to re-implement. I was thinking about how to format the routing table packets to more efficiently use the packet space. Currently, the routing table packets are formatted to include neighbor addresses multiple times with every route. I could remove this redundancy by only mentioning the neighbor once, followed by the all of it's possible routes. I'll think about this as I re-implement the routing table packets.

samuk commented 4 years ago

@paidforby is there code somewhere to test this new routing? Is it in the main LoRaLayer2 already?

Wondering if it resolves https://github.com/sudomesh/disaster-radio/issues/38

paidforby commented 4 years ago

@samuk Yes it is possible to test the new routing on a ESP32 dev board. Most, if not all, of the relevant changes have been made in LoRaLayer2 and since https://github.com/sudomesh/LoRaLayer2/commit/bc93f272ab2ee730b96bf91a4de521779f186411 it should be working, so you could point at that commit in platformio.ini. Though I have not fully tested it on a dev board myself. I'm still working on bringing back the proactive routing table packets before merging the DSR branch of LL2 back into master. When I do that I will also confirm that everything in the dev board firmware is working correctly.

I do not know if it will fix #38, since I do not entirely know what was causing that, however, I did come across some bugs and errors in logic in LL2 that could have been the cause. And since routing tables packets will not be transmitted by default, then the root cause has gone away.

cyclomies commented 4 years ago

@cyclomies I have made significant progress in updating the protocol as we discussed. You can read a description of how routes are not built on the wiki, https://github.com/sudomesh/disaster-radio/wiki/Protocol#building-routes.

Well done! Thank you for your reply.

It seems to be working well enough in the simulator, https://github.com/sudomesh/disaster-radio-simulator, which is also much more usable with the addition of serial clients for every simulated node that can be accessed using socat.

I still have some further generalization I need to make to the protocol. And I have yet to bring back proactive routing table packets as an option, but that should be easy to re-implement.

I did read the wiki description. Sounds good to have a option to mix both reactive and proactive routing. Nicely documented.

Can you specify more clearly, how the nodes start to operate when cold booting (no routing tables) the mesh? From the wiki: "WHen a node joins a LoRa mesh network it can begin sending messages to it's neighbors immeadiately, but it has to wait to hear messages sent to nodes more than 1 hop away before it can begin routing outside of it's immediate neighbors."

If a node does not know a route to a specific destination, shouldn't the node broadcast the message? The broadcast is changed to unicast (routing), when the broadcast reaches a node with sufficient routing information? Those nodes that "hear" the change, stop broadcasting?

I was thinking about how to format the routing table packets to more efficiently use the packet space. Currently, the routing table packets are formatted to include neighbor addresses multiple times with every route. I could remove this redundancy by only mentioning the neighbor once, followed by the all of it's possible routes. I'll think about this as I re-implement the routing table packets.

Is there a need for sending the "via node" (neighbor) information? Could the table be sent only with hops, destination and metric?

In the wiki, should "from" be replaced by "to", for indicating routes to destinations? Just a minor thought. 1 hops from 26a8f7dd via 26a8f7dd metric 195

jacquesalbert commented 4 years ago

I have just recently discovered this project and am very interested, but forgive me if I'm a little out-of-the-loop on previous avenues of discussion.

I see that the project has largely moved past the idea of georouting due to the possibility of reaching local dead-ends. I work in utility-level mesh networking and the main system I deal with uses georouting. I agree, it's a dangerous game even when the system is engineered, let alone an ad-hoc style mesh. The benefit, of course, is that it avoids the scalability problem of huge routing tables.

I have, however, wondered about the feasibility of a hybrid system. What if the mesh routed using the current routing table system LOCALLY, but all addresses were hierarchical enough to have another layer of routing that is roughly geographic? Spitballing: using some kind of Maidenhead-style address (I was thinking instead of alphanumeric, it'd just be a sort of lat/long based limited quadtree or something) augmented with a unique MAC-based address, the total address could tell a node if it is trying to talk to something in its own "subnet" or "submesh" and if so, it would use the routing table as normal. However, anything bound for something OUTSIDE its submesh would be assumed to NOT BE IN the routing table at all, and would instead be sent blind to nodes that are somehow flagged as interface nodes to another submesh. Submeshes address segments would be set by hand when the node is set up or automatically if GPS is available.

So imagine, you have a node on the edge of two submesh regions -- it is flagged as being an access node between those two submeshes because it sees nodes from both. You may also have a node on a mountain that happens to have reliable comms to several further submesh regions that aren't even contiguous. Fine: it advertises that it is an access node between those. A node within the submesh has a packet destined for a different submesh...it finds the ACCESS NODE in its own submesh that links to the closest submesh to the one it is trying to reach based on geographical calculations.

samuk commented 4 years ago

@jacquesalbert "LOCALLY, but all addresses were hierarchical enough to have another layer of routing that is roughly geographic? Spitballing: using some kind of Maidenhead-style address (I was thinking instead of alphanumeric, it'd just be a sort of lat/long based limited quadtree or something"

Sounds interesting, I think I understand this concept, let me restate to see.

So we add eight numbers, and two +/- symbols to give latitude/ longitude to one decimal place. This gives us 6.9mile squares covering the earth. We also add one (b)order flag to each packet. So this would cost 11 characters per packet?

When I add my first node in a network I either manually enter the lat/long, or the GPS does it. For this theoretical network, it is 35.1, -121.5

Each additional node brought into the network listens for 100 packets and uses the most frequent Lat/long it hears as it's 'main' network (eg 35.1, -121.5) If it hears additional lat/long then it also acts as a 'border node'.

Routing within the 35.1, -121.5 square happens in the reactive way outlined by Grant above.

If a 'normal node' receives a message originating outside of its main lat/long square (eg 35.1,-121.4) it just ignores/drops it.

If a Border node receives a message originating in 35.1,-121.4 that has an unset (b)order flag it edits the packet to 35.1,-121.5 and sets the (b) flag before forwarding.

Each packet should therefore only cross a geographic border once? Writing this I'm still not sure I understand it. Is that kind of what you meant?

paidforby commented 4 years ago

Can you specify more clearly, how the nodes start to operate when cold booting (no routing tables) the mesh? From the wiki: "WHen a node joins a LoRa mesh network it can begin sending messages to it's neighbors immeadiately, but it has to wait to hear messages sent to nodes more than 1 hop away before it can begin routing outside of it's immediate neighbors."

If a node does not know a route to a specific destination, shouldn't the node broadcast the message? The broadcast is changed to unicast (routing), when the broadcast reaches a node with sufficient routing information? Those nodes that "hear" the change, stop broadcasting?

@cyclomies to clarify on the cold boot scenario. Since this is reactive routing there is no automated process to build routes (though I suppose one could be implemented). A user of a node must begin by broadcasting a message. Neighboring nodes hear the broadcast, add the sender as a route and then could then begin sending unicast messages to that sender. Once the original sender hears a broadcast or unicast message and adds routes, it can then also begin transmitting unicast messages.

By saying "sending messages to it's neighbors immeadiately" I mean broadcasting messages. Broadcast messages are defined as messages addressed to all neighbors (1 hop destinations).

Is there a need for sending the "via node" (neighbor) information? Could the table be sent only with hops, destination and metric?

You are correct, there is no need to send the "via node" info in the route table packet, in fact I was already doing it in the most efficient way possible, see https://github.com/sudomesh/disaster-radio/wiki/Protocol#routing-table-packets. Thanks for pointing that out, I would have wasted a lot of time, trying to rethink something that was already as minimal as possible.

In the wiki, should "from" be replaced by "to", for indicating routes to destinations? Just a minor thought.

I suppose this is a matter of perspective.

jacquesalbert commented 4 years ago

@samuk Sounds interesting, I think I understand this concept, let me restate to see.

You've got the right idea, but let me clarify:

So we add eight numbers, and two +/- symbols to give latitude/ longitude to one decimal place. This gives us 6.9mile squares covering the earth. We also add one (b)order flag to each packet. So this would cost 11 characters per packet?

I think a more efficient system might be a bitwise geocode. For example, alternating E/W-N/S bits for increasing precision. Maybe the first bit is basically 0 for the west half of the globe, and 1 for the east half. Then the next bit is 0 for the North half and 1 for the South half. Then the next bits are 0 /1 for the West/East half of the west half of the globe, and the same for the N/S half of the North half of the globe, and so on. Using this method my napkin math tells me we should be able to get a precision of less than half a mile with 16 bits each, so a single int. Additionally, we don't WANT or NEED that level of precision, so either we can use fewer bits and pack them in, or always use an int and mask or shift it. Regarding the border node tag, see below.

When I add my first node in a network I either manually enter the lat/long, or the GPS does it. For this theoretical network, it is 35.1, -121.5

Each additional node brought into the network listens for 100 packets and uses the most frequent Lat/long it hears as it's 'main' network (eg 35.1, -121.5) If it hears additional lat/long then it also acts as a 'border node'.

Routing within the 35.1, -121.5 square happens in the reactive way outlined by Grant above.

I think this (barring the caveat above about the format of the address) is a great idea, for new nodes to potentially "learn" where they are based on surrounding nodes if they have not already been given a location. Other than that, yeah. Any node that handles a packet with a matching geocode just routes it using the mesh routing tables as discussed above.

If a 'normal node' receives a message originating outside of its main lat/long square (eg 35.1,-121.4) it just ignores/drops it.

If a Border node receives a message originating in 35.1,-121.4 that has an unset (b)order flag it edits the packet to 35.1,-121.5 and sets the (b) flag before forwarding.

Each packet should therefore only cross a geographic border once? Writing this I'm still not sure I understand it. Is that kind of what you meant?

In the system I was imagining, packets might have to cross a geographic border any number of times, and therefore I think some additional logic is necessary, but I see where you are coming from with regard to the changing of the border flag. I'm thinking no border flag is necessary on the packet because we have the geocode already in the address. Here's an example: GeoCodeHybridRouting

Nodes 1, 2, 3, and 4 are all in the same submesh (0000) so they all route between each other like normal. We need to find a way for the routing table to include nodes outside of the submesh, but ONLY if they are directly linked to nodes inside the submesh. Then, if the destination is in the submesh, find the next best submesh hop like normal, but if it's outside the submesh, find the next best submesh and THEN find the next best hop to the "intermediate" node in the table that is in that submesh.

I think choosing the right balance of submesh size (geocode precision) to routing table size would at least put a sort of limit on the total table size to the number of submesh nodes PLUS the number of border nodes in other submeshes. For low power mesh networks, this should be pretty low. I would think having a submesh size of something like a few miles might mean that most local comms within a town would be routed normally and georouting would be rare. I also suggested a Maidenhead style system of alternating lat/long with increasing precision because I have also wondered if there is a way to use a dynamic mask to allow submeshes to be "resized" in areas where that might be beneficial, since each node is handling this on a packet-by-packet basis instead of maintaining a full topology map outside of its submesh. After all, if there are only 5 nodes in a city, georouting is probably more of a detriment than a benefit, but if there are 5000, it might be a necessity. Alternately, if the submesh size is too small you might be able to detect somehow that you are running into the "dead end" problem and increase the size until you minimize that.

Regarding routing: Okay, now imagine a packet is coming from node 1 (full geocoded address is 00001) and it is intended for node 9 (full address 10019). Node 1 knows that 10019 is not in its submesh (1001 != 0000) so instead of looking for 10019 in its routing table, it looks for any nodes that are NOT in 0000 and compares all of them geographically to pick the next best hop. So in this case node 1 might have node 5 in its table also. The geo distance between 0010 (the geocode portion of node 5's full address) and the destination of 1001 is less than the distance between 0000 and 1001 so this is a good candidate for the "intermediate destination" of node 5. Obviously, Node 1 does not see node 5 directly, so it picks the next hop for this packet using normal mesh routing as you guys have discussed previously, choosing node 3.

Now, node 3 does the same calculation. It has a packet destined for 1001, it is in 0000 so it looks for the best off-mesh link and finds 5. Normal mesh routing causes it to pick 2 or 4, let's say 2 has a better link quality so it chooses 2. Node 2 does the same thing and it has a direct link to 5 so it sends it to 5.

Node 5 has the same problem as node 3: it has been handed a packet with a destination not in its own submesh, so it follows the same process and picks either Node 7 or C. Both have the same geo distance, so it probably picks by the next hop link quality for each of those possibilities. Let's say C. C hands it to A, and node A sees that this packet is destined for the submesh that matches its own, so it just looks it up in the table like normal (ignoring any geo addressing and skipping those calculations).

cyclomies commented 4 years ago

@cyclomies to clarify on the cold boot scenario. Since this is reactive routing there is no automated process to build routes (though I suppose one could be implemented). A user of a node must begin by broadcasting a message. Neighboring nodes hear the broadcast, add the sender as a route and then could then begin sending unicast messages to that sender. Once the original sender hears a broadcast or unicast message and adds routes, it can then also begin transmitting unicast messages.

By saying "sending messages to it's neighbors immeadiately" I mean broadcasting messages. Broadcast messages are defined as messages addressed to all neighbors (1 hop destinations).

@paidforby Understood, perfect. And if I understood correctly, broadcast messages will be re-broadcasted according to the TTL value the sender specify?

I suppose this is a matter of perspective.

Well put! ;)

To avoid collisions, there could be specific random delayed time slots for TX:

slot 1: 200-400ms ACK's
slot 2: 600-2000ms re-broadcasts
slot 3: 4000-8000ms routed packets and broadcasts

Slot 1 for ACK's of the intended receiver, as they will also end other nodes to continue broadcasting or routing to the intended recipient.

Slot 2 to prioritize re-broadcasted packets, as they are extremely handy for node discovery and updating node databases. Therefore, they will work for optimize routing.

Slot 3 for all routed packets and broadcasted packets from the originating node.

In short: messages (unicast or broadcast), and packets intended for routing, would be sent with a 4-8s delay. Any packets intended for re-broadcast will be sent before them, as they will most likely help in discovering new nodes and optimize the routes. ACK's are always sent "immediately", as they will end unnecessary broadcasting and unnecessary routing, and therefore free up airtime. ACK's would only be sent in slot 1 for 1-to-1 messaging only. Other ACK's would be sent in slot 3.

The ms used in slots are only examples, as I did not count the correct packet airtime or physical TX/RX delays. Those ms delays for slots should actually be specified with % (of max packet air time for 1 minute?), not hard coded, to work with different network settings, duty cycles and so on?

cyclomies commented 4 years ago

About those TX slots I mentioned: At first, I incorrectly thought, that the packets (with a maximum character length) should be possible to send, before the next TX slot. This is off course not the case. Sorry for the inconvenience!

Therefore, the TX timeslots could be as follows:

slot 1: 200-400ms (ACK's)
slot 2: 420-620ms (re-broadcasts)
slot 3: 640-840ms (all other packets)

As such, a routed packet would be sent in slot 3, if there are no other traffic on the air at that time. Moreover, if a broadcast is sent, and four nodes (including the intended recipient) hear the packet, the intended recipient node could send the ACK packet (slot 1) before any other node would have the time to re-broadcast (slot 2) the message. And as the ACK is heard by other nodes, they update their nodeDB and do not re-broadcast the message.

samuk commented 4 years ago

I think a more efficient system might be a bitwise geocode. For example, alternating E/W-N/S bits for increasing precision. Maybe the first bit is basically 0 for the west half of the globe, and 1 for the east half.

@paidforby are you interested in this supplemental geo approach suggested by @jacquesalbert ? Realise it's solving problems we don't yet have, but a globally scaleable network does appeal.

paidforby commented 4 years ago

I am interested in trying to implement it as optional feature. I need to think more deeply about how to actually integrate it into the LL2 library, not sure how soon i'll get to it. but I'll add it to my todo list.

jacquesalbert commented 4 years ago

I am interested in trying to implement it as optional feature. I need to think more deeply about how to actually integrate it into the LL2 library, not sure how soon i'll get to it. but I'll add it to my todo list.

@paidforby I have been playing around with this idea a little bit with the help of the simulator and while I think that something like what I described above could be a very effective "best of both worlds" system, I have struggled a bit to fit all the pieces of the idea together in a clean way.

However, I think that the most effective way to get some mileage out of it DOES actually solve a current problem with the mesh: convergence time, especially when the mesh is less than active. By implementing geohashed addresses (almost for free, data-wise, since you already need a fairly long unique address) you can implement simple directional georouting as a fallback for when the routing table does not yet contain the destination, or when the routing table has reached some maximum size.

Basically, if a node needs to send a packet to something it doesn't have a route to, it just routes it to the node in its table that is the closest geographically to the destination.

tgies commented 4 years ago

The S2 coordinate system might be useful for this geographic routing. Uses 64-bit coordinates which are the distance along a Hilbert curve superimposed on the globe, precision can be arbitrarily reduced, checking for containment of a point within a cell or adjacency of points are literally bitwise operations, etc. https://s2geometry.io/

cyclomies commented 4 years ago

@paidforby, Any major updates regarding LL2?

paidforby commented 4 years ago

Nothing much. I started a new j.o.b. so I've had less time to work on this. Most recent updates involved supporting dual lora modules (which will be on the new disaster radio dev boards). Next, I plan on updating/improving the simulator, but who knows when that will actually get done. I've mostly stopped revising the routing protocol because it requires too much time/energy. I encourage others to try implementing the geographic routing, which I still find very desirable.

samuk commented 3 years ago

I noticed that Reticulum has had a first release: https://unsigned.io/projects/reticulum/

There's a discussion of porting to C: https://github.com/markqvist/Reticulum/issues/2 but no code yet. Does anyone have the skills/interest to help with this?

X-Ryl669 commented 3 years ago

If I understand Reticulum correctly, it'll not fit LoRA radio (unlike what they claims), mainly because it requires a MTU of 500 bytes (which can not be done reliably on LoRAWAN since it implies multiple packet other a short time, if you want to respect the law that's a no-no). Also, I don't understand how they deal with exponential routing table issue (I think they don't deal with it in any way), so it'll require a lot of memory on the client to store the routes & Link.

mwarning commented 3 years ago

Recticulum seems to send announcements as floods. This does not scale very well.

markqvist commented 3 years ago

Reticulum is perfectly well suited to LoRa, but probably not LoRaWAN (which is a bit purposeless any way). It is correct that Reticulum has a fixed MTU of 500 bytes, which LoRa can accomodate just fine, and without any reliability impact. LoRaWAN probably can't - I haven't actually made any attempts at this, so that is just my guess. In no jurisdiction is "multiple packets over a short time" viewed as a "no-no". That is not what duty cycle means ;)

Announcements are not floods in Reticulum. They are prioritised and forwarded in a way that scales quite well. Some more info: https://markqvist.github.io/Reticulum/manual/understanding.html#the-announce-mechanism-in-detail

samuk commented 3 years ago

Thanks @markqvist what happens when the link table is full?

"Any node that forwards the link request will store a link id in it’s link table , along with the amount of hops the packet had taken when received. " https://markqvist.github.io/Reticulum/manual/understanding.html#link-establishment-in-detail

markqvist commented 3 years ago

Ideally, that should not happen. Just one megabyte of memory can accommodate more than 22.000 active links, so in most cases, this will not be a problem. Inactive links are of course culled from the table automatically.

X-Ryl669 commented 3 years ago

From that table: There is not SF where LoRa can send a 500 byte packet. So Reticulum will have to split a "packet" on multiple LoRa packet. In reality, in my area, only 125kHz uplink channel are allowed, and a maximum duty cycle of 10%, so it limits to SF between 7 to 9, and a theorical bitrate between ~1700 to ~5000bps. Yet, if you want to respect the duty cycle, you can't emit at 5kbps, but only 500bps on average (surely, you can peak to 5kbps, but only if your average emission is only <= 500bps).

Even then a 500 bytes packet must be split to at least 3 packets (since LoRa support a maximum of 242 bytes packet), and more generally, if you want to reach some long range, you'll limit yourself to a SF 9, so with 53 bytes per packet, meaning that a Reticulum packet will use 10 LoRa packet for a single own packet. Ten packets on a 1700bps link mean it's using 2.35s to send a Reticulum packet, and you'll have to wait ~21s before sending another one.

By watching my LoRa's gateway receiving time graph, a 2.3s transmission will saturate and prevent many other services to talk. I'm not even speaking about error code correction (since a 2.3s transmission is more likely to get disturbed in ISM band) that'll also require some additional bandwidth to include.

All in all, I don't think Reticulum fits LoRa network, or only on a lab experiment where only the lowest SF are used.

markqvist commented 3 years ago

You seem to be talking about LoRaWAN, which that table also seem to be describing. As I said before, I believe it's quite pointless to run Reticulum over LoRaWAN.

LoRa != LoRaWAN

LoRa is a PHY modulation scheme. LoRaWAN is a networking stack running on top of the LoRa PHY. Reticulum is another networking stack that can run on top of LoRa, amongst many other things. Maybe you are confusing the terms.

Really, no offense intended here, but I don't really care much whether you "don't think Reticulum fits LoRa network". I have hundreds of LoRa devices deployed in the real world running Reticulum, and they seem to be doing fine, in spite of your thoughts on the matter.

A lot of your statements in the above are false, in relation to LoRa. They are maybe true in regards to LoRaWAN.

Edit: Either way, I think it is pointless for this thread to get filled with discussion on Reticulum, since it would not be of any help to the disaster.radio project before a C++ implementation is ready anyways, and that is still a while in the future. As of writing this, the only functional Reticulum release is the Python reference implementation.