jgromes / RadioLib

Universal wireless communication library for embedded devices
https://jgromes.github.io/RadioLib/
MIT License
1.58k stars 395 forks source link

[SX126x] Data buffer invalid sometimes when receiving on SF8 & 125kHz BW #1252

Closed jacobeva closed 3 weeks ago

jacobeva commented 1 month ago

Describe the bug When the bandwidth is set to 125kHz on the SX1262, and the SF is 8 (and only 8), sometimes packet data is incorrect when received. But CRC has been passed of course.

Debug mode output

``` 20:08:31.710 -> [SX1262] Received packet! 20:08:31.710 -> [SX1262] Data: 20:08:31.710 -> [SX1262] RSSI: -53.00 dBm 20:08:31.710 -> [SX1262] SNR: 12.50 dB 20:08:31.710 -> [SX1262] Frequency error: -546.38 Hz 20:08:32.710 -> [SX1262] Received packet! 20:08:32.710 -> [SX1262] Data: Hello World! #12 20:08:32.710 -> [SX1262] RSSI: -33.00 dBm 20:08:32.710 -> [SX1262] SNR: 11.75 dB 20:08:32.710 -> [SX1262] Frequency error: 337.12 Hz 20:08:33.807 -> [SX1262] Received packet! 20:08:33.807 -> [SX1262] Data: Hello World! #13 20:08:33.807 -> [SX1262] RSSI: -31.00 dBm 20:08:33.807 -> [SX1262] SNR: 13.50 dB 20:08:33.807 -> [SX1262] Frequency error: 337.12 Hz ``` You will notice the data field is empty. I am using the receive example which tries to send a string, of course. When sending a byte array instead, the output is thus: ``` 20:11:59.045 -> [SX1262] Data: 123455678ABCDEF 20:11:59.045 -> [SX1262] RSSI: -31.00 dBm 20:11:59.045 -> [SX1262] SNR: 13.00 dB 20:11:59.045 -> [SX1262] Frequency error: 329.38 Hz 20:12:00.142 -> [SX1262] Received packet! 20:12:00.142 -> [SX1262] Data: 123455678ABCDEF 20:12:00.142 -> [SX1262] RSSI: -31.00 dBm 20:12:00.142 -> [SX1262] SNR: 11.50 dB 20:12:00.142 -> [SX1262] Frequency error: 329.38 Hz 20:12:00.239 -> [SX1262] Received packet! 20:12:00.239 -> [SX1262] Data: 012342CEB3C 20:12:00.239 -> [SX1262] RSSI: -49.00 dBm 20:12:00.239 -> [SX1262] SNR: 12.50 dB 20:12:00.239 -> [SX1262] Frequency error: -546.38 Hz 20:12:00.398 -> [SX1262] Received packet! 20:12:00.398 -> [SX1262] Data: 01234EDF13C 20:12:00.398 -> [SX1262] RSSI: -49.00 dBm 20:12:00.398 -> [SX1262] SNR: 13.00 dB 20:12:00.398 -> [SX1262] Frequency error: -546.38 Hz 20:12:01.203 -> [SX1262] Received packet! 20:12:01.203 -> [SX1262] Data: 123455678ABCDEF 20:12:01.203 -> [SX1262] RSSI: -32.00 dBm 20:12:01.203 -> [SX1262] SNR: 12.50 dB 20:12:01.203 -> [SX1262] Frequency error: 329.38 Hz 20:12:02.298 -> [SX1262] Received packet! 20:12:02.298 -> [SX1262] Data: 123455678ABCDEF 20:12:02.298 -> [SX1262] RSSI: -32.00 dBm 20:12:02.298 -> [SX1262] SNR: 12.50 dB 20:12:02.298 -> [SX1262] Frequency error: 329.38 Hz 20:12:02.362 -> [SX1262] Received packet! 20:12:02.362 -> [SX1262] Data: 012342CEB3C 20:12:02.362 -> [SX1262] RSSI: -48.00 dBm 20:12:02.362 -> [SX1262] SNR: 12.50 dB 20:12:02.362 -> [SX1262] Frequency error: -530.88 Hz 20:12:03.392 -> [SX1262] Received packet! 20:12:03.392 -> [SX1262] Data: 123455678ABCDEF 20:12:03.392 -> [SX1262] RSSI: -31.00 dBm 20:12:03.392 -> [SX1262] SNR: 13.00 dB 20:12:03.392 -> [SX1262] Frequency error: 329.38 Hz ``` Notice how the values are the same each time when corrupted.

To Reproduce

Sketch that is causing the module fail

Receive: ```c++ // include the library #include // SX1262 has the following connections: // NSS pin: 10 // DIO1 pin: 2 // NRST pin: 3 // BUSY pin: 9 SX1262 radio = new Module(10, 2, 3, 9); // or using RadioShield // https://github.com/jgromes/RadioShield //SX1262 radio = RadioShield.ModuleA; // or using CubeCell //SX1262 radio = new Module(RADIOLIB_BUILTIN_MODULE); void setup() { Serial.begin(9600); // initialize SX1262 with default settings Serial.print(F("[SX1262] Initializing ... ")); int state = radio.begin(868.0, 125.0, 8); if (state == RADIOLIB_ERR_NONE) { Serial.println(F("success!")); } else { Serial.print(F("failed, code ")); Serial.println(state); while (true) { delay(10); } } // set the function that will be called // when new packet is received radio.setPacketReceivedAction(setFlag); // start listening for LoRa packets Serial.print(F("[SX1262] Starting to listen ... ")); state = radio.startReceive(); if (state == RADIOLIB_ERR_NONE) { Serial.println(F("success!")); } else { Serial.print(F("failed, code ")); Serial.println(state); while (true) { delay(10); } } // if needed, 'listen' mode can be disabled by calling // any of the following methods: // // radio.standby() // radio.sleep() // radio.transmit(); // radio.receive(); // radio.scanChannel(); } // flag to indicate that a packet was received volatile bool receivedFlag = false; // this function is called when a complete packet // is received by the module // IMPORTANT: this function MUST be 'void' type // and MUST NOT have any arguments! #if defined(ESP8266) || defined(ESP32) ICACHE_RAM_ATTR #endif void setFlag(void) { // we got a packet, set the flag receivedFlag = true; } void loop() { // check if the flag is set if(receivedFlag) { // reset flag receivedFlag = false; // you can read received data as an Arduino String String str; int state = radio.readData(str); // you can also read received data as byte array /* byte byteArr[8]; int numBytes = radio.getPacketLength(); int state = radio.readData(byteArr, numBytes); */ if (state == RADIOLIB_ERR_NONE) { // packet was successfully received Serial.println(F("[SX1262] Received packet!")); // print data of the packet Serial.print(F("[SX1262] Data:\t\t")); Serial.println(str); // print RSSI (Received Signal Strength Indicator) Serial.print(F("[SX1262] RSSI:\t\t")); Serial.print(radio.getRSSI()); Serial.println(F(" dBm")); // print SNR (Signal-to-Noise Ratio) Serial.print(F("[SX1262] SNR:\t\t")); Serial.print(radio.getSNR()); Serial.println(F(" dB")); // print frequency error Serial.print(F("[SX1262] Frequency error:\t")); Serial.print(radio.getFrequencyError()); Serial.println(F(" Hz")); } else if (state == RADIOLIB_ERR_CRC_MISMATCH) { // packet was received, but is malformed Serial.println(F("CRC error!")); } else { // some other error occurred Serial.print(F("failed, code ")); Serial.println(state); } } } ``` Send: ```c++ // include the library #include // SX1262 has the following connections: // NSS pin: 10 // DIO1 pin: 2 // NRST pin: 3 // BUSY pin: 9 SX1262 radio = new Module(10, 2, 3, 9); // or using RadioShield // https://github.com/jgromes/RadioShield //SX1262 radio = RadioShield.ModuleA; // or using CubeCell //SX1262 radio = new Module(RADIOLIB_BUILTIN_MODULE); void setup() { Serial.begin(9600); // initialize SX1262 with default settings Serial.print(F("[SX1262] Initializing ... ")); int state = radio.begin(868.0, 125.0, 8); if (state == RADIOLIB_ERR_NONE) { Serial.println(F("success!")); } else { Serial.print(F("failed, code ")); Serial.println(state); while (true) { delay(10); } } // some modules have an external RF switch // controlled via two pins (RX enable, TX enable) // to enable automatic control of the switch, // call the following method // RX enable: 4 // TX enable: 5 /* radio.setRfSwitchPins(4, 5); */ } // counter to keep track of transmitted packets int count = 0; void loop() { Serial.print(F("[SX1262] Transmitting packet ... ")); // you can transmit C-string or Arduino string up to // 256 characters long String str = "Hello World! #" + String(count++); int state = radio.transmit(str); // you can also transmit byte array up to 256 bytes long /* byte byteArr[] = {0x01, 0x23, 0x45, 0x56, 0x78, 0xAB, 0xCD, 0xEF}; int state = radio.transmit(byteArr, 8); */ if (state == RADIOLIB_ERR_NONE) { // the packet was successfully transmitted Serial.println(F("success!")); // print measured data rate Serial.print(F("[SX1262] Datarate:\t")); Serial.print(radio.getDataRate()); Serial.println(F(" bps")); } else if (state == RADIOLIB_ERR_PACKET_TOO_LONG) { // the supplied packet was longer than 256 bytes Serial.println(F("too long!")); } else if (state == RADIOLIB_ERR_TX_TIMEOUT) { // timeout occured while transmitting packet Serial.println(F("timeout!")); } else { // some other error occurred Serial.print(F("failed, code ")); Serial.println(state); } // wait for a second before transmitting again delay(1000); } ```

Expected behavior Obviously these packets shouldn't be corrupted the way they are.

Additional info (please complete):

jgromes commented 1 month ago

The timing information is interesting - it seems that these "packets" always arrive directly after receiving real data. My first guess is that the IRQ gets triggered multiple times for some reason, pulling garbage data from the internal FIFO. Why this does not get caught by the CRC and/or packet length check is also interestting in of itself.

Notice how the values are the same each time when corrupted.

They are similar, but not exactly the same. It is also interesting that the frequency error and RSSI is the same, but SNR is not. Though maybe that's just because we have relatively few data points.

20:12:00.239 -> [SX1262] Data:      012342CEB3C
20:12:00.239 -> [SX1262] RSSI:      -49.00 dBm
20:12:00.239 -> [SX1262] SNR:       12.50 dB
20:12:00.239 -> [SX1262] Frequency error:   -546.38 Hz
20:12:00.398 -> [SX1262] Received packet!
20:12:00.398 -> [SX1262] Data:      01234EDF13C
20:12:00.398 -> [SX1262] RSSI:      -49.00 dBm
20:12:00.398 -> [SX1262] SNR:       13.00 dB
20:12:00.398 -> [SX1262] Frequency error:   -546.38 Hz
jacobeva commented 1 month ago

Ah I see. Are you able to replicate this at all?

I also wonder why there is garbage data in the FIFO buffer? Obviously in the receive example there is no call to transmit, nor am I telling the module to sleep, so shouldn't the FIFO buffer simply have the data of the last packet when it's read in your hypothesis? I could be wildly wrong in my understanding, of course.

jacobeva commented 1 month ago

I should also note that here the RSSI and frequency error are different:

20:12:02.298 -> [SX1262] Received packet!
20:12:02.298 -> [SX1262] Data:      123455678ABCDEF
20:12:02.298 -> [SX1262] RSSI:      -32.00 dBm
20:12:02.298 -> [SX1262] SNR:       12.50 dB
20:12:02.298 -> [SX1262] Frequency error:   329.38 Hz
20:12:02.362 -> [SX1262] Received packet!
20:12:02.362 -> [SX1262] Data:      012342CEB3C
20:12:02.362 -> [SX1262] RSSI:      -48.00 dBm
20:12:02.362 -> [SX1262] SNR:       12.50 dB
20:12:02.362 -> [SX1262] Frequency error:   -530.88 Hz
jgromes commented 1 month ago

Are you able to replicate this at all?

I will try to and update here. I don't have the exact module you have, so we'll see.

I also wonder why there is garbage data in the FIFO buffer?

If my hypothesis is correct, then we would be reading FIFO after the packet data have already been read and the FIFO should be "empty". What happens next depends on how this situation is handled by the SX126x internally; for example, the FIFO pointer (determining where the data is read from) could wrap around from address 0 back to 255. It really depends on the implementation ...

jacobeva commented 1 month ago

Just verified from a Heltec LoRa32 v3 running a different driver, packets transmitted from the SX1262 are in fact transmitted correctly. It's just the reception with this driver that's an issue.

jgromes commented 1 month ago

I tried to replicate this on my setup, which is a RPi with Waveshare SX1262 hat, and after some 8500 packets, I don't see anything strange. The next step I would take in your case would be to check the DIO0 line with an oscilloscope to make sure there is only one edge as the packet is received. But to me it seems like a hardware issue now.

jacobeva commented 1 month ago

That's interesting. I'll see what I can do. I should mention that the modules I am testing have worked perfectly fine previously on a driver that I used to use. I will try them again with that old driver to check they're still in good working order.

jacobeva commented 1 month ago

I've just tested these modules with the old driver, and they function totally fine. The packets are printed out the other end exactly as they were entered into the transmitter. I'm going to do some digging around your library and see if I can spot anything which seems suspicious :)

jgromes commented 1 month ago

@jacobeva did you have the time to dig into this? If not I will go ahead and close the issue, since I was not able to reproduce it.

jacobeva commented 1 month ago

Hi, apologies for the delay!

I will have time to look into this next week. I'll take a look through your drivers and have a think about what could potentially be affecting it in my hardware platform as well.

On 24/10/16 10:25pm, Jan Gromeš wrote:

@jacobeva did you have the time to dig into this? If not I will go ahead and close the issue, since I was not able to reproduce it.

-- Reply to this email directly or view it on GitHub: https://github.com/jgromes/RadioLib/issues/1252#issuecomment-2418529463 You are receiving this because you were mentioned.

Message ID: @.***>

amirna2 commented 3 weeks ago

Hello, I may be seeing a similar issue. I recently updated RadioLib to 7.0.2 (from 6.4.2) -Yes, a big jump, but upgrade was straight forward.

My issue is that the first call to readData after the first instance of an RX_DONE returns the correct/expected data. But all subsequent ones return garbage (RX_DONE was triggered and no CRC error). I have my own CRC check for my internal data, so that prevented my device from processing the corrupt data. I reverted back to 6.4.2 (only change was the IrqStatus method) and all is working again with no issues.

I can debug some more if that helps and provide the RadioLib traces.

I am using the following radio parameters:

[I][DeviceBuilder.cpp] Setting LoRa radio params: LoraRadioParams(ss=8, rst=12, di0=13, di1=14, band=915000000, txPower=20, bw=125000000, sf=7, gain=0, privateNetwork=1)

And testing this between 2 Heltec Lora ESP32 v3 boards.

jgromes commented 3 weeks ago

@amirna2 it would be useful to see SPI debug from both versions (as well as the code).

What I find strange and why I'm leaning towards this being a hardware issue is that:

  1. I was not able to replicate the behavior reported by @jacobeva
  2. Reviewing the code and docs there's nothing that could be causing this - also you are using SF7 while the original reported issue was only for SF8
  3. If this was some widespread issue we would have seen a lot more reports, both from internal testing (e.g. in LoRaWAN) as well as from other projects using this library.
jacobeva commented 3 weeks ago

Hmmm, maybe I should try that version and see if my problem still exists. If the only thing that has been touched is the IRQ stuff, then its almost certainly triggering twice by accident, or something similar

On 28 October 2024 04:40:32 GMT, Amir Nathoo @.***> wrote:

Hello, I may be seeing a similar issue. I recently updated RadioLib to 7.0.2 (from 6.4.2) -Yes, a big jump, but upgrade was straight forward.

My issue is that the first call to readData after the first instance of an RX_DONE returns the correct/expected data. But all subsequent ones return garbage (RX_DONE was triggered and no CRC error). I have my own CRC check for my internal data, so that prevented my device from processing the corrupt data. I reverted back to 6.4.2 (only change was the IrqStatus method) and all is working again with no issues.

I can debug some more if that helps and provide the RadioLib traces.

I am using the following radio parameters:

[I][DeviceBuilder.cpp] Setting LoRa radio params: LoraRadioParams(ss=8, rst=12, di0=13, di1=14, band=915000000, txPower=20, bw=125000000, sf=7, gain=0, privateNetwork=1)

And testing this between 2 Heltec Lora ESP32 v3 boards.

-- Reply to this email directly or view it on GitHub: https://github.com/jgromes/RadioLib/issues/1252#issuecomment-2440537409 You are receiving this because you were mentioned.

Message ID: @.***>

amirna2 commented 3 weeks ago

@jgromes So I tried to reproduce the issue again (using the same 2 Heltec boards). Both on 7.0.2 and I can't reproduce the problem. Over 100 packets transmitted and received successfully at 30 second interval, before I stopped the test. I also tried one board with 6.4.2 and the other one on 7.0.2 and they both seem to work as expected.

So not sure what to make of this. I use a thin wrapper around RadioLib APIs and also use the interrupt driven startReceive and startTransmit, maybe there were some issues in my own code. I'll keep everything on 7.0.2 and see the problem comes back.

jgromes commented 3 weeks ago

@amirna2 thank you for the test. Seeing how inconsistent this issue is (and for the 3 reasons outlined in my previous post), my conclusion is that this is somehow related to hardware.

Neverthless, @jacobeva fell free to reopen later if new information becomes available - thanks!