markqvist / RNode_Firmware

RNode is an open, free and flexible digital radio interface with many uses
https://unsigned.io/rnode
GNU General Public License v3.0
145 stars 52 forks source link

Possible CSMA failure at very low datarate, possibly Bluetooth-related #75

Closed VladislavF closed 1 month ago

VladislavF commented 3 months ago

Describe the Bug When attempting to send a message through an RNode device (Lilygo T-beam 433MHz SX1278 firmware 1.7) using the Sideband app on android, messages always fail to send going from "Establishing link" to "waiting for path" to "failed". This is evidenced by the fact that my two test devices will begin transmitting simultaneously as seen by the red LED indicator.

To Reproduce Connect two RNode devices with sideband (at least one using the android client). After announcements, attempt to send a message from a device using the android app. The message will fail.

Expected Behavior Sending a message from the android app should complete successfully.

Logs & Screenshots I don't know how to get to log output from Android, I can dump a PC side log.

System Information

Additional context This issue is not present when using the Sideband python/pip package. I can run the same radios with two computers (an Ubuntu machine, and an RPi) running the Sideband client without issue. Or even using nomadnet is not a problem either. This issue is somehow specific to the Sideband android app. Similarly, messages sent from the computer to the mobile app will typically succeed. Messages sent from the app always fail.

I have also tried this with the past 3 versions of the app on two different phones.

VladislavF commented 3 months ago

I should also mention. I point out in the summary that I can tell when the message is going to fail by the nodes' transmit indicator. The app seems to ignore channel state or something as it will begin sending while the device directly next to it is still transmitting. This again does not happen with the pip version (0.8.0) where the two devices will gracefully ping-pong packets until the message is transferred successfully (without talking at the same time).

markqvist commented 3 months ago

Hmm, that's odd. Thanks for the report. Can you specify what LoRa settings you used? I don't need the frequency and transmit power, but just the bandwidth, spreading factor and coding rate.

VladislavF commented 3 months ago

Hmm, that's odd. Thanks for the report. Can you specify what LoRa settings you used? I don't need the frequency and transmit power, but just the bandwidth, spreading factor and coding rate.

Yes, thank you for reminding me. I meant to post this earlier. SF: 12 CR: 8 BW: 62.5kHz

I have not tried this with higher rates yet. Thanks for your good work :)

markqvist commented 3 months ago

Thanks! Since Sideband itself actually has no control over how and when the RNode transmits (this is running in a fast algorithm on the RNode hardware), it looks like this may be more related to the RNode itself, but let's investigate it further.

The settings you posted result in a bitrate of 92 bits per second only, which will give very long frame times. LoRa boards that do not have very good frequency stability (with a TCXO) tend to loose sync over so long frames, so this may actually be what is happening here.

I don't know what boards you are using, but in general, I'd say such a low bitrate requires both ends to have TCXOs for it to work, so it is likely that this is simply a limitation of the hardware you are using.

Speaking from experience, issues like this can be frustrating to diagnose and resolve, since even small changes in environment can affect whether it works or not, leading to inconsistent and confusing results in the test setup. Since these boards a susceptible to temperature fluctuations, you may find it working suddenly, then not, then working again, and it can be very confusing what is going on, if you are not aware that just small changes in temperature can affect the oscillator stability enough to de-sync frame reception on slower bitrates.

As a test, would it be possible for you to try BW=62.5 KHz, SF=8, CR=6 to check whether the packet collisions still occur, of if it resolves it?

VladislavF commented 3 months ago

Okay, I have tried your suggestion and it does resolve the problem, however I do not think that synchronization is the issue. This is for several reasons which I have just now double checked to be sure.

  1. The Meshtastic firmware happily works with these radios down to 31.25kHz with the CF:12 and CR:8.
  2. My radios are 433MHz so oscillator stability is less of a concern
  3. I can confirm packets are getting across properly with the loramon CLI tool.
  4. The desktop application (sbapp) works under this configuration just fine, so does nomadnet. it's only the mobile app that doesn't work. The only difference is with my phone I have to use Bluetooth. Perhaps if this is a firmware issue, it would have to be connected with Bluetooth somehow.

Without looking at the code, I'm only guessing, but it seems like it could be a hard-coded timer or timeout that automatically resets before a message is complete since this mode is not anticipated?

VladislavF commented 3 months ago

for context these are my exact units: https://www.lilygo.cc/products/t-beam-v1-1-esp32-lora-module?variant=43059202654389

markqvist commented 3 months ago

Thanks a lot for the detailed info and ideas about the issue. I will try to recreate the problem with the hardware I have here to see if I can replicate and debug it.

Just to be absolutely sure, you linked to the T-Beams based on the SX1278 chip, this is the ones you have, right?

markqvist commented 3 months ago

Just for info, I'm moving this to the RNode firmware repo. I currently think it's most likely to be firmware related.

VladislavF commented 3 months ago

Very good. Yes it's SX1278 based.

VladislavF commented 3 months ago

I take it there isn't a way to run with bluetooth on a linux machine? I would try to test this to see if it's connected to bluetooth, but the reticulum interface options for LoRA seem to be serial only. I did try serial with android and I actually have the same problem as with bluetooth. I used a usb-C dock to connect the rnode via wire and there still seems to be some sort of CSMA issue. perhaps this is an android problem?

faragher commented 3 months ago

You can connect via Bluetooth and use the port as normal. If /dev/ttyACM0 is USB or Bluetooth doesn't matter to Reticulum, so if you know what the port is named, you can use it as normal.

VladislavF commented 3 months ago

I had no idea it was possible to run a serial connection over bluetooth... You learn something every day! one /dev/rfcomm0 device later I can confirm that this too works properly. Messages flow seamlessly both ways. Again pointing to the issue being somehow connected to either android or the android app.

VladislavF commented 3 months ago

FWIW here's a loramon log from my Ubuntu machine with a remote device trying to send a message. Here is what would otherwise be a success (RPi via BT):

loramon /dev/ttyACM1 -C -W loracapture --freq 433000000 --bw 62500 --sf 12 --cr 8
[2024-03-28 21:15:22] Opening serial port /dev/ttyACM1...
[2024-03-28 21:15:25] RNode connected
[2024-03-28 21:15:25] Firmware version: 1.70
[2024-03-28 21:15:25] Radio reporting frequency is 433.0 MHz
[2024-03-28 21:15:25] Radio reporting bandwidth is 62.5 KHz
[2024-03-28 21:15:25] Radio reporting TX power is 2 dBm
[2024-03-28 21:15:25] Radio reporting spreading factor is 12
[2024-03-28 21:15:25] Radio reporting coding rate is 8
[2024-03-28 21:15:25] RNode in LoRa promiscuous mode and listening
[2024-03-28 21:15:40] [-132 dBm] [SNR 1.0 dB] [84 bytes]    b'\x00\x02\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\x00L\xcb\xb8\xc6.\xa4\x06\x1eR\x88\x02\x8bk\xbe^A\x86XA\xf0\xfb9j\x17\x0fc\xc9"\x91\xd4\xd8/\x17\x17\x19\xb7\xf6\xf0\xadC\x81\xb7P\x90\xda\xf1 5\xb8\xe8\x13\xcaq\x1f\x1b\xff_\xf9\xcat8\x93\x0b8'
[2024-03-28 21:16:33] [-132 dBm] [SNR 1.5 dB] [52 bytes]    b'\xf0\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8bduW\x11i\xa9h\x89\x0c\x9ap_P9\xf9g'
[2024-03-28 21:16:42] [-132 dBm] [SNR 1.0 dB] [52 bytes]    b'\xe0\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\xac\x99u\x19I\xf4\xa6s\xe7\x8c\xdb\xd0\xb3\x12\x1e\xef'
[2024-03-28 21:16:51] [-132 dBm] [SNR 1.0 dB] [52 bytes]    b'\xd0\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\x88\xf1\xf5\xe7\xf0n\xab\x8e\xfe\xd3\xc0\x03\x12L\x80\xd1'
[2024-03-28 21:16:59] [-131 dBm] [SNR 1.0 dB] [52 bytes]    b'\xd0\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\x89\xe6\x98<\x17\xf9\xa3\x9d\x0c;\x8a6\x81\xceX\xd3'
[2024-03-28 21:17:07] [-132 dBm] [SNR 1.0 dB] [52 bytes]    b'\x90\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b`\x96\xb0wf\xc2\xe69\x03\xa8-\xe5E\x9a{\x8e'

Here's one that would fail (Android via BT)

loramon /dev/ttyACM1 -C -W loracapture --freq 433000000 --bw 62500 --sf 12 --cr 8
[2024-03-28 21:25:19] Opening serial port /dev/ttyACM1...
[2024-03-28 21:25:22] RNode connected
[2024-03-28 21:25:22] Firmware version: 1.70
[2024-03-28 21:25:22] Radio reporting frequency is 433.0 MHz
[2024-03-28 21:25:22] Radio reporting bandwidth is 62.5 KHz
[2024-03-28 21:25:22] Radio reporting TX power is 2 dBm
[2024-03-28 21:25:22] Radio reporting spreading factor is 12
[2024-03-28 21:25:22] Radio reporting coding rate is 8
[2024-03-28 21:25:23] RNode in LoRa promiscuous mode and listening
[2024-03-28 21:25:37] [-131 dBm] [SNR 1.75 dB] [84 bytes]   b'\xa0\x02\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\x00\x89\xd1{\xbb\xabq\x96\xe3\rJ@\x13\x7f\xca[\xc2"\xb1\xb9\x8es\x9bS\x83rL\x7f\xcdH\x94Hr\x8dH\xc4\xd2\xady\x8eS\xfcis\xde\xf0\x8bM\xb1\xd5\x88{Lu`9\t\xc2\x19g{\n\x99\x9d\x13'
[2024-03-28 21:25:45] [-132 dBm] [SNR 1.5 dB] [52 bytes]    b'0\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\xaf\xe7o\xd0\xa1m\x9fn\xdb\xc8>\xdd\x93\xb9Y\xa0'
[2024-03-28 21:26:32] [-131 dBm] [SNR 1.75 dB] [52 bytes]   b'\xc0\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\xd0\xfbs\x8b\xe7\xe4$SpB\xce0xt\x99\x06'
[2024-03-28 21:26:44] [-131 dBm] [SNR 1.75 dB] [52 bytes]   b'\x80\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\xbc\x00z\xf0\x07O\xd1\xbe\xc1\x82m\xe5\x933\x861'
[2024-03-28 21:26:52] [-131 dBm] [SNR 1.5 dB] [52 bytes]    b'`\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b\xb4\xa1\xb2\xf7\xc6\xa6\xd4~\x00a\x0b\x84zaJ\xa4'
[2024-03-28 21:27:02] [-132 dBm] [SNR 1.25 dB] [52 bytes]   b'\x80\x08\x00k\x9ff\x01M\x98S\xfa\xab"\x0f\xbaG\xd0\'a\x00\x06\xdef\xbb\x99v\xb8\x8d$fvF\xc4\xf5\xd8\x8b#*\x1f\xfev\xbe{%\xf2\xd7\xe9<\xc7\xf2\xe3\xa3'

Don't know if that's any use, not sure if these messages are encrypted or not. (The radios are connected to dummy loads which is why the levels are low)

EDIT: to include packets sent after link was not returned

VladislavF commented 3 months ago

Sorry to spam the thread but I did note a possibly unintentional but only tangentially related behavior: When changing device configuration parameters such as bandwidth or even changing interfaces from Serial to BT for instance, messages will not pass successfully until the device has been reset or restarted after the change. I have noticed this also happens when using loramon. I have to reset the device before using sideband again. May not be connected to the interface at all, just something hanging perhaps?

I.e. if I change the bandwidth on the app and restart the app, it will not work. I must also reset the device. Same is true for the other node. This may be expected, but has caught me off guard several times now where it takes me 10 minutes to remember I have to reset the device and that's why messages aren't going through.

markqvist commented 3 months ago

Thanks for all the detailed information and logs, very helpful.

not sure if these messages are encrypted or not

Yes, they are encrypted, but just having the packet flow with timestamps is what is helpful in debugging anyways, so thanks for providing it.

When changing device configuration parameters such as bandwidth or even changing interfaces from Serial to BT for instance, messages will not pass successfully until the device has been reset or restarted after the change.

Hmm, that is interesting. It's intended behaviour that the host application / Reticulum stack opening the RNode interface needs to be restarted for new radio parameters to be applied. It's currently not "hot-configurable", and for new hardware configuration to take effect you must:

That being said however, you should be able to do just that, without having to power-cycle or reset the RNode device. The RNode should take the new configuration parameters when re-configured by the host. If this does not happen, there may be a bug that was introduced somewhere along the line, because this is definitely not intended behaviour.

VladislavF commented 3 months ago

I have tried increasing bandwidth to 125, and 250kHz with the same results. However, going up to 500kHz has resulted in successful communication. So perhaps your guess about data rates is correct. Seems somewhere around 0.7kbps things start working correctly.

markqvist commented 3 months ago

Ok, interesting, thanks for adding. I did find a potential issue with the CSMA RNG init on ESP32 and corrected it. It's in the main source repo now. If it is easy for you to build from source, you can pull the latest version from master and run:

make firmware-tbeam upload-tbeam

That will give you the firmware with the fix on the devices, and you could try and see if it makes any difference. If it does, it is indeed a CSMA issue. If not, my bets are on the LoRa modems losing sync due to the lack of TCXO on the boards.

From what I've seen too, the limit also seems to be around the 0.7Kbps you mention, before needing a very stable clock source. At least for longer packets. Smaller data payloads usually seem to work better, which again points to this happening because of clock drift / sync mismatch.

faragher commented 3 months ago

While it works on Meshtastic, and it's still pretty in the ballpark, I'd like to point out official specs require 50ppm accuracy for SF12. At 433 MHz, I'd expect 65k to usually work and 125k to always work (as far as frequency mismatch goes) so long as the crystals are actually 20ppm. Since these appear to use the same LoRa module as other LilyGOs, I expect they should work.

That's for raw LoRa with the SX series chips. If there's an additional Reticulum requirement, it would be worth discovering.

VladislavF commented 3 months ago

I have flashed this firmware and I appear to have the same issue at the lower bandwidths. Again, I would buy the sync issue if meshtastic also had this problem and if the desktop version of sideband had this problem. But neither of these is the case. I have even managed to transfer an image at this configuration (62.5khz) over the period of like 15 minutes between an rpi and ubuntu desktop with sideband

markqvist commented 3 months ago

Thanks a bunch for putting all this effort into it and collecting information. I'll be looking into this and trying to replicate it when I have a chance, but if you figure out anything else in the meantime, please do share it here as well.

markqvist commented 2 months ago

Could you try this again with the latest versions of RNS (0.7.4) and the RNode firmware (1.71)? A range of improvements were made that could affect this problem too.

markqvist commented 1 month ago

Assuming fixed in 1.71, closing. If problem persist, please refer to issue in https://github.com/liberatedsystems/RNode_Firmware_CE.