nRF24 / RF24

OSI Layer 2 driver for nRF24L01 on Arduino & Raspberry Pi/Linux Devices
https://nrf24.github.io/RF24
GNU General Public License v2.0
2.23k stars 1.02k forks source link

Receiving data on a closed pipe. #913

Closed msobarzo closed 1 year ago

msobarzo commented 1 year ago

I have a problem where I receive a packet on a closed pipe. Initially I noticed this problem in the context of transmitting a large array but currently I have the same problem even though I'm sending an increasing integer. Right now I'm using a very basic program because I wanted to reproduce the problem and try to identify the cause.

I'm currently using the nRF24L01+PA+LNA connected to 3v3 pin of the ESP32. The ESP32 is being powered from the USB port of my laptop and I have also tested with different modules.

The problem is I see the output in my receiver code, but sometimes I see the the correct value but is being received on pipe 5, this happens randomly. Also the settings on both modules are the same, I checked using the radio.printPrettyDetails();

This is my transmitter code

#include <SPI.h>
#include <Arduino.h>
#include <RF24.h>
// #include <RF24Network.h>
#include <printf.h>

// create an RF24 object
RF24 radio(5, 17); // CE, CSN  

// address through which two modules communicate.
const uint64_t address = 0x7878787878LL;

void setup()
{
  Serial.begin(115200);

  radio.begin();
  if (!radio.isChipConnected())
  {
    Serial.print("Chip not connected.");
  }
  radio.setPALevel(RF24_PA_MIN);
  radio.setChannel(80);
  radio.stopListening();
  radio.openWritingPipe(address);
  radio.printPrettyDetails();
}

int msgNumber = 0;

void loop()
{
  bool ok = radio.write(&msgNumber, sizeof(msgNumber));
  if (ok)
  {
    Serial.println("Message sent");
  }
  else
  {
    Serial.println("Message not sent");
  }
  msgNumber++;
  delay(100);
}

This is my receiver code

#include <Arduino.h>
#include <SPI.h>
#include <RF24.h>

RF24 radio(17, 2, 2000000); // CE, CSN  (For the S3 version)

// address through which two modules communicate.
const uint64_t address = 0x7878787878LL;

RF24 radio(5, 17); // CE, CSN

void setup()
{
  Serial.begin(115200);
  radio.begin();
  if (!radio.begin())
  {
    Serial.println("radio hardware is not responding!!!");
    while (1)
    {
    } // hold in infinite loop
  }
  if (!radio.isChipConnected())
  {
    Serial.print("Chip not connected.");
  }
  radio.setPALevel(RF24_PA_MIN);
  radio.openReadingPipe(1, address);
  radio.startListening();
  radio.setChannel(80);
  radio.printPrettyDetails();
}

uint8_t pipe_rx;
int packet;

void loop()
{
  if (radio.available(&pipe_rx))
  {
    radio.read(&packet, sizeof(packet));

    if (pipe_rx > 1)
    {
      Serial.print("PACKET ON PIPE: ");
      Serial.println(pipe_rx);
      Serial.println(packet);
    }

    if (pipe_rx == 1)
    {
      Serial.print("CORRECT PIPE: ");
      Serial.println(packet);
    }
  }
}

Finally, this is the output of radio.printPrettyDetails() on my receiver code.

SPI Frequency = 10 Mhz Channel = 80 (~ 2480 MHz) Model = nRF24L01+ RF Data Rate = 1 MBPS RF Power Amplifier = PA_MIN RF Low Noise Amplifier = Enabled CRC Length = 16 bits Address Length = 5 bytes Static Payload Length = 32 bytes Auto Retry Delay = 2750 microseconds Auto Retry Attempts = 15 maximum Packets lost on current channel = 0 Retry attempts made for last transmission = 0 Multicast = Disabled Custom ACK Payload = Disabled Dynamic Payloads = Disabled Auto Acknowledgment = Enabled Primary Mode = RX TX address = 0xe7e7e7e7e7 pipe 0 (closed) bound = 0xe7e7e7e7e7 pipe 1 ( open ) bound = 0x7878787878 pipe 2 (closed) bound = 0xc3 pipe 3 (closed) bound = 0xc4 pipe 4 (closed) bound = 0xc5 pipe 5 (closed) bound = 0xc6

TMRh20 commented 1 year ago

I recreated this issue on Arduino Due and Nano using your code.

I believe it is due to the following note from the datasheet:

Note: The 3 bit pipe information in the STATUS register is updated during the IRQ pin high to low transition. The pipe information is unreliable if the STATUS register is read during an IRQ pin high to low transition.

In looking at the datasheet, one might be able to work around this issue by getting the pipe number and checking the corresponding RX_PW_P0, P1 register to see if data is actually available on that pipe. I'm not 100% sure but I think you've found an ugly bug when using polling of available() with no easy fix.

TMRh20 commented 1 year ago

This might fix the problem:

bool RF24::available(uint8_t* pipe_num)
{
    // get implied RX FIFO empty flag from status byte
    uint8_t pipe = (get_status() >> RX_P_NO) & 0x07;
    uint8_t pipe2 = (get_status() >> RX_P_NO) & 0x07;

    if( pipe != pipe2){
      pipe = (get_status() >> RX_P_NO) & 0x07;
    }

    if (pipe > 5)
        return 0;
    // If the caller wants the pipe number, include that
    if (pipe_num)
        *pipe_num = pipe;

    return 1;
}

Essentially, check the pipe twice and if it doesn't match, read it a third time. Are you able to edit RF24.cpp and test out these changes?

What I don't like about this fix is that it may affect performance negatively, but I'm not sure how else to address this issue.

2bndy5 commented 1 year ago

This is a bit perplexing. I have a hunch, but I would like to know what exact radio module you are using (manufacturer, vendor, etc). I suspect a clone isn't adhering to the original datasheet's info about the status byte returned for every SPI transaction.

the suggestion from TMRh20 doesn't actually bypass my suspicion Reading the status byte directly from the radio's `STATUS` register should be ~absolutely~ accurate: ```cpp // using inheritance instead of altering RF24.cpp directly class RF24_Custom : public RF24 { bool available(uint8_t *pipe_num) { uint8_t pipe = (read_register(STATUS) >> RX_P_NO) & 0x07; if (pipe > 5) return 0; // If the caller wants the pipe number, include that if (pipe_num) *pipe_num = pipe; return 1; } }; // then use the derived class instead RF24_Custom radio(5, 17); ```

I would like to know if the behavior changes when the pipe_rx variable is initialized to an invalid pipe number (7 or higher). I don't think this would fix it, but it is better practice in general.

Lastly, there could be a slight corruption in data over the SPI bus. To diagnose this, I would request using a lower SPI speed. I see your code does this for the ESP32-S3... Is this also the case on the device you're seeing the wrong pipe number on?

msobarzo commented 1 year ago

@2bndy5 I checked the module and it says it's the E01-ML01DP5 (when I use the isPvariant I get true)

I matched the SPI speed for both modules now and the problem still persist, I also tried initializing to an invalid pipe number, especifically 7 and as you suspected that didn't work.

Lastly I tried what @TMRh20 wrote and that solved the problem. I received all the data in the correct pipe. I let the program run for a few minutes to be sure and it worked. Right now it worked fine when I use one transmitter I still need to check if it works when I use 2 transmitters.

This might fix the problem:

bool RF24::available(uint8_t* pipe_num)
{
    // get implied RX FIFO empty flag from status byte
    uint8_t pipe = (get_status() >> RX_P_NO) & 0x07;
    uint8_t pipe2 = (get_status() >> RX_P_NO) & 0x07;

    if( pipe != pipe2){
      pipe = (get_status() >> RX_P_NO) & 0x07;
    }

    if (pipe > 5)
        return 0;
    // If the caller wants the pipe number, include that
    if (pipe_num)
        *pipe_num = pipe;

    return 1;
}

Essentially, check the pipe twice and if it doesn't match, read it a third time. Are you able to edit RF24.cpp and test out these changes?

What I don't like about this fix is that it may affect performance negatively, but I'm not sure how else to address this issue.

2bndy5 commented 1 year ago

Good to hear.

As for addressing this problem, I can't think of a better way (using a delayMicroseconds() could be worse). I think the the main reason available() has been problematic is due to the speed-ups I suggested/implemented a few years back (#650 and https://github.com/nRF24/RF24/pull/691/commits/1e9bcb822e2da62462702cc39ea94d21b74c4358).

Since then, I've been more concerned with the same problem when available() is called directly after whatHappened(). The one thing I noticed is that it takes a moderately fast MCU to actually encounter this problem, so maybe we could take that into consideration if/when mainlining the suggested fix.

msobarzo commented 1 year ago

I'm not really sure if this is the place to comment this. But I originally found this problem when I was using the nrf2401 connected to a Raspberry Pi 4 (4GB), I implemented the same fix and it also worked when I used the Raspberry with the pyRF24 wrapper. At least right now it works for 1 transmitters. I'll try again later with 2 transmitters

2bndy5 commented 1 year ago

I think the FIFO_STATUS register could be used to avoid unnecessarily reading the STATUS byte multiple times:

bool RF24::available(uint8_t* pipe_num)
{
    if (!isFifo(false, true)) { // if RX FIFO is not empty
        // If the caller wants the pipe number
        if (pipe_num) {
            // isFifo() updates the internal `status` member
            uint8_t pipe = (status >> RX_P_NO) & 0x07;

            if (pipe != ((get_status() >> RX_P_NO) & 0x07)) {
                pipe = (get_status() >> RX_P_NO) & 0x07;
            }
            *pipe_num = pipe;
        }
        return 1;
    }

    return 0;
}

EDITED: using the FIFO_STATUS register is more ideal than using the RX_DR flag (mainly used for IRQ).

2bndy5 commented 1 year ago

when I used the Raspberry with the pyRF24 wrapper.

I'm assuming you're using the SPIDEV driver and the individual python wrapper included in this repo (not the pyrf24 pypi pkg). Other drivers (pigpio, ~MRAA~ & RPi) seem to be too slow to encounter this.

2bndy5 commented 1 year ago

Remembering a footnote from the datasheet's register map section:

The RX_DR IRQ is asserted by a new packet arrival event. The procedure for handling this interrupt should be: 1) read payload through SPI, 2) clear RX_DR IRQ, 3) read FIFO_STATUS to check if there are more payloads available in RX FIFO, 4) if there are more data in RX FIFO, repeat from step 1.

So, if not using IRQ, then I guess the datasheet implies using the FIFO_STATUS register to check for RX'd payloads. Essentially, this means reverting the idea in #650. Now (after addressing #650) that we are caching the STATUS byte returned on every SPI transaction, we still have 1 less transaction to perform (in comparison to v1.3.9).

I also think using the FIFO_STATUS register may allow reading the STATUS byte only 1 additional time because a few microseconds should have elapsed during the read_register(FIFO_STATUS) (a 2-byte transfer). Simply put: return the second read attempt.

bool RF24::available(uint8_t* pipe_num)
{
    if (!isFifo(false, true)) { // if RX FIFO is not empty
        // If the caller wants the pipe number
        if (pipe_num) {
            *pipe_num = (get_status() >> RX_P_NO) & 0x07;
        }
        return 1;
    }

    return 0;
}
TMRh20 commented 1 year ago

Nice! I think that does correct the problem although in theory I think it would be possible to still get an incorrect reading if the timing of reception were such that the second reading was taken during the IRQ transition of a second packet. This should be a very rare occurrence I think though.

Also, I think the functions should be modified as such to only grab the pipe number if requested, because the pointer is now initialized to a non-null value either way:

bool RF24::available(void)
{
    uint8_t pipe = 0xFF;
    return available(&pipe);
}

/****************************************************************************/

bool RF24::available(uint8_t* pipe_num)
{
    if (!isFifo(false, true)) { // if RX FIFO is not empty
        // If the caller wants the pipe number
        if (*pipe_num != 0xFF) {
            *pipe_num = (get_status() >> RX_P_NO) & 0x07;
        }
        return 1;
    }

    return 0;
}
2bndy5 commented 1 year ago

I think it would be possible to still get an incorrect reading if the timing of reception were such that the second reading was taken during the IRQ transition of a second packet. This should be a very rare occurrence I think though.

I hadn't thought of that. In a perfect world, the status byte inaccuracy would only apply to an empty RX FIFO.

2bndy5 commented 1 year ago

I would prefer to use a macro definition instead of a magic number (for readability)

#define RF24_NO_FETCH_PIPE 0xFF
2bndy5 commented 1 year ago

created a branch and drafted #914 to see compile size implications. I still need to HW test this solution on a setup that can reproduce this issue.

msobarzo commented 1 year ago

when I used the Raspberry with the pyRF24 wrapper.

I'm assuming you're using the SPIDEV driver and the individual python wrapper included in this repo (not the pyrf24 pypi pkg). Other drivers (pigpio, ~MRAA~ & RPi) seem to be too slow to encounter this.

Yes, I'm using the SPIDEV driver and the wrapper in this repo. I also managed to make it work with two transmitters and all the data was received correctly. I don't know if it would be worth to add a condition to check if the returned pipe is a valid open pipe in the MulticeiverDemo example?.


    if (radio.available(&pipe)) {              // is there a payload? get the pipe number that recieved it
      uint8_t bytes = radio.getPayloadSize();  // get the size of the payload
      radio.read(&payload, bytes);             // fetch payload from FIFO
      Serial.print(F("Received "));
      Serial.print(bytes);  // print the size of the payload
      Serial.print(F(" bytes on pipe "));
      Serial.print(pipe);  // print the pipe number
      Serial.print(F(" from node "));
      Serial.print(payload.nodeID);  // print the payload's origin
      Serial.print(F(". PayloadID: "));
      Serial.println(payload.payloadID);  // print the payload's number
    }```
2bndy5 commented 1 year ago

I don't know if it would be worth to add a condition to check if the returned pipe is a valid open pipe in the MulticeiverDemo example?

It shouldn't be necessary if the radio and library are working as they should. The state of the RX FIFO should additionally prevent using the stored pipe value from being used to display an invalid value.

@msobarzo If you get a chance to test the revert-650 branch on your RPi, that would be helpful feedback because it has the proposed solution for this issue.