markqvist / RNode_Firmware

RNode is an open, free and flexible digital radio interface with many uses
https://unsigned.io/rnode
GNU General Public License v3.0
145 stars 52 forks source link

RNode CSMA fails to synchronize transmission between two adjacent nodes #72

Closed attermann closed 2 months ago

attermann commented 4 months ago

Describe the Bug

In a simple lora mesh scenario with 3 nodes where one node is acting as a transport node between the other two, a nomadnet client is unable to successfully retrieve a page from a nomadnet server over the transport node.

To Reproduce

In this topology, node A and node C are not within direct range of each other, but both are within direct range of node B which is operating as a transport node.

  1. Clear destinations and launch rnsd on node A (nomadnet client node)
  2. Clear destinations and launch rnsd on node B (transport node)
  3. Clear destinations and launch rnsd on node C (nomadnet server node)
  4. Confirm through rnpath that node A has a route to node C (2 hops via node B)
  5. Confirm through rnprobe that node C responds to probe requests from node A
  6. Launch nomadnet on node A and wait for radio activity to settle
  7. Launch nomadnet on node C and wait for radio activity to settle
  8. Attempt to connect nomadnet client on node A to nomadnet server running on node C
  9. Observe that link is established (most of the time) but page transfer (706 bytes) hangs indefinitely at 50% progress

Expected Behavior

Logs & Screenshots

Node A RNS:

[2024-03-10 14:54:37] [Extra] Filtered packet with hash <f9b94589798d4ccc294c05f8db5250dd4f433ce6a14eee8f79dcbdec4d7fce31>
[2024-03-10 14:54:38] [Extra] Link request proof validated for transport via LocalInterface[61210]
[2024-03-10 14:54:39] [Extra] Valid announce for <2b9129e4a711eec591b20b23af9c0c2e> 2 hops away, received via <86d3792a7dede793babe75de722aa6e8> on RNodeInterface[RNode LoRa Interface] [RSSI -99dBm, SNR 9.75dB]
[2024-03-10 14:54:39] [Debug] Destination <2b9129e4a711eec591b20b23af9c0c2e> is now 2 hops away via <86d3792a7dede793babe75de722aa6e8> on RNodeInterface[RNode LoRa Interface]
[2024-03-10 14:54:40] [Debug] Rebroadcasting announce for <2b9129e4a711eec591b20b23af9c0c2e> with hop count 2
[2024-03-10 14:54:40] [Extra] Added announce to queue (height 1) on RNodeInterface[RNode LoRa Interface] for processing in 24.73s
[2024-03-10 14:54:41] [Extra] Filtered packet with hash <8ebcdb15bf77320c06f09f79a69c61b575139e747cca43a3c3ef6719fb2d837d>
[2024-03-10 14:54:42] [Extra] Filtered packet with hash <f2304cab3f6e6f80768ec157ad74bfd82f8d5718e88ed8bf14db87755362e3e0>
[2024-03-10 14:54:42] [Extra] Filtered packet with hash <6820a9744d04f0c2af3cc18b35b8528e82581184471c52ece9762883ca178902>
[2024-03-10 14:54:46] [Debug] Rebroadcasting announce for <2b9129e4a711eec591b20b23af9c0c2e> with hop count 2
[2024-03-10 14:54:47] [Extra] Completed announce processing for <2b9129e4a711eec591b20b23af9c0c2e>, retry limit reached
[2024-03-10 14:55:56] [Extra] Filtered packet with hash <6cefff781298f657b8f5f8e3c00f759517f46c48aa387382c67beac52c31e289>
[2024-03-10 14:56:11] [Debug] Trying to rediscover path for <2b9129e4a711eec591b20b23af9c0c2e> since an attempted local client link was never established
[2024-03-10 14:56:11] [Extra] Released 1 link
[2024-03-10 14:56:15] [Extra] Valid announce for <2b9129e4a711eec591b20b23af9c0c2e> 2 hops away, received via <86d3792a7dede793babe75de722aa6e8> on RNodeInterface[RNode LoRa Interface] [RSSI -92dBm, SNR 11.75dB]

Node B RNS:

[2024-03-10 14:54:38] [Extra] Link request proof validated for transport via RNodeInterface[RNode LoRa Interface]
[2024-03-10 14:54:42] [Extra] Valid announce for <cbb023fff1a06352d210030eb91412d2> 3 hops away, received via <d2db5a15a5a1a9efd612cbb6b0130bb4> on RNodeInterface[RNode LoRa Interface] [RSSI -98dBm, SNR 11.75dB]
[2024-03-10 14:55:06] [Extra] Valid announce for <2b9129e4a711eec591b20b23af9c0c2e> 3 hops away, received via <d2db5a15a5a1a9efd612cbb6b0130bb4> on RNodeInterface[RNode LoRa Interface] [RSSI -89dBm, SNR 9.5dB]
[2024-03-10 14:56:06] [Debug] Trying to rediscover path for <2b9129e4a711eec591b20b23af9c0c2e> since an attempted link was never established, and destination was previously local to an interface on this instance
[2024-03-10 14:56:06] [Extra] Released 1 link
[2024-03-10 14:56:11] [Debug] Path request for <2b9129e4a711eec591b20b23af9c0c2e> on RNodeInterface[RNode LoRa Interface]
[2024-03-10 14:56:11] [Debug] Answering path request for <2b9129e4a711eec591b20b23af9c0c2e> on RNodeInterface[RNode LoRa Interface], path is known
[2024-03-10 14:56:12] [Debug] Rebroadcasting announce as path response for <2b9129e4a711eec591b20b23af9c0c2e> with hop count 1
[2024-03-10 14:56:13] [Extra] Completed announce processing for <2b9129e4a711eec591b20b23af9c0c2e>, retry limit reached

Node C RNS:

[2024-03-10 20:54:38] [Extra] Link request proof validated for transport via RNodeInterface[RNode LoRa Interface]
[2024-03-10 20:54:39] [Extra] Filtered packet with hash <c48e2071844960e011696a33388c790b52a1290579b1ca53100a3543c280821a>
[2024-03-10 20:54:40] [Extra] Valid announce for <2b9129e4a711eec591b20b23af9c0c2e> 2 hops away, received via <86d3792a7dede793babe75de722aa6e8> on RNodeInterface[RNode LoRa Interface] [RSSI -106dBm, SNR -3.0dB]
[2024-03-10 20:55:56] [Extra] Link request proof validated for transport via RNodeInterface[RNode LoRa Interface]
[2024-03-10 20:56:15] [Extra] Valid announce for <2b9129e4a711eec591b20b23af9c0c2e> 2 hops away, received via <86d3792a7dede793babe75de722aa6e8> on RNodeInterface[RNode LoRa Interface] [RSSI -107dBm, SNR 4.0dB]

Below is an annotated SDR screenshot demonstrating this scenario. Note that in the annotations the term "repeater" and "repeated" is used as a generalization for transport node and packets that are relayed by the transport node.

Screen Shot FAILED MARKUP

System Information

markqvist commented 4 months ago

Thanks for the comprehensive and detailed report.

If easily reproducible, it's suspicious that they are apparently transmitting in an exactly synchronized manner. If P is normalized to a float instead of the integer-based 1-255 scale, the probability of this happening should be (Pa * Pb) / Ds where Ds is the amount of slot synchronization in relation to DCD sampling interval.

Following areas are of potential interest:

markqvist commented 2 months ago

Closing this for now as fixed in RNS 0.7.4 and RNode Firmware 0.7.1. Feel free to add additional comments or resources if problem persists. Thanks again for pointing this out, and the data you collected on it. Was a pretty stubborn one to track down.