nRF24 / RF24

OSI Layer 2 driver for nRF24L01 on Arduino & Raspberry Pi/Linux Devices
https://nrf24.github.io/RF24
GNU General Public License v2.0
2.21k stars 1.02k forks source link

Possible incorrect sequence inside stopListening method #253

Closed eos1d3 closed 7 years ago

eos1d3 commented 8 years ago

The use of delays inside the method is to ensure ACK package to be sent without returning to Standby-I mode immediately.

void RF24::stopListening(void) { ce(LOW);

delayMicroseconds(txRxDelay);

if(read_register(FEATURE) & _BV(EN_ACK_PAY)){ delayMicroseconds(txRxDelay); //200 flush_tx(); }

In the above code, the ce(LOW) should be moved after the two delays. This gives enough time to send out the ACK package.

I had tested it with my own implementation for STM8L, The original code may not cause problem because auto-retransmit is helping. But the retry is not necessary if delay is moved to the right place.

Please correct me if I am wrong. Thanks!

TMRh20 commented 8 years ago

Interesting. Do you have any example code you can provide that gives me a better idea of where you are seeing problems, along with your configuration etc? Also, have you tested functionality with and without using ACK payloads?

I spent a lot of time testing and adjusting this initially, with features like ACK payloads etc. so I would like to get more info on what is happening if you are noticing improvements with these kind of changes.

eos1d3 commented 8 years ago

Please wait more time, I will report later as I am doing a stress testing.

But from the program flow, the code above does not make sense. right?

The ce(LOW) is to switch from RX mode to Standby-I. But ACK package or ACK payload package may still be in TX_FIFO and is waiting for transmission. Leaving RX mode immediately may cause ACK package not be sent.

For normal logic, we need first delay to wait for ACK package to be sent, and a second delay for ACK payload to be sent as it takes longer time. After all these are done, then it is time to leave RX mode by lowering CE bit.

So the code is in reverse order.

Currently I am do a ping-pong test with no auto re-transmission at all so any failure of transmission can be found.

From what I found right now, if CE(low) is placed at the first code in stopListening, So many ACK package will not be received. And then I add a few us delay in front of ce(low), then suddenly all ACK package can be received.

You can do the same like tihs:

  1. Create two nRF24L01+ units.
  2. Unit A send a data package to B
  3. Unit B receives the package, then reply by sending another data package to unit A
  4. after unit A receives the package, restart from 1

Then try the order of ce(Low) and delays, you will find the difference.

I forked from maniacbug, but I changed almost everything. You cannot use my version to test. But I can give the codes to you if you want to see.

I do not use any delay in the whole nRFL24L01+ library, And it can transmit million of transactions without any problem. Turning off auto-retransmission is also possible.

And I am confused with your txRxDelay. Why does "16Mhz Arduino" use shorter delay than ARM? Please let me know. Thanks!!

TMRh20 commented 8 years ago

"But from the program flow, the code above does not make sense. right?"

Maybe not? The thing is I added the one delay to prevent ACK payloads from being flushed before they were sent, so I just assumed that the radio would still handle ACKs when not using ACK payloads. In looking through some commits, it looks like that has always been the order of operations, but what your saying does seem to make sense.

I need to test.

"Why does "16Mhz Arduino" use shorter delay than ARM? Please let me know."

8-bit AVR/Arduino is just slow compared to ARM devices, so a certain amount of delay is inherent in every transaction. On ARM devices like Due, timing of delays, digital IO, etc, should be more accurate relative to the code.

On Fri, May 6, 2016 at 2:31 PM, eos1d3 notifications@github.com wrote:

Please wait more time, I will report later as I am doing a stress testing.

But from the program flow, the code above does not make sense. right?

The ce(LOW) is to switch from RX mode to Standby-I. But ACK package or ACK payload package may still be in TX_FIFO and is waiting for transmission. Leaving RX mode immediately may cause ACK package not be sent.

For normal logic, we need first delay to wait for ACK package to be sent, and a second delay for ACK payload to be sent as it takes longer time. After all these are done, then it is time to left RX mode by lowering CE bit.

So the code is in reverse order.

Currently I am do a ping-pong test with no auto re-transmission at all so any failure of transmission can be found.

From what I found right now, if CE(low) is placed at the first code in stopListening, So many ACK package will not be received. And then I add a few us delay in front of ce(low), then suddenly all ACK package can be received.

But you can do the same like tihs:

  1. Create two nRF24L01+ units.
  2. Unit A send a data package to B
  3. Unit B receives the package, then reply by sending another data package to unit A
  4. after unit A receive the package, restart from 1

Then try the order of ce(Low) and delays, you will find the difference.

I forked from maniacbug, but I changed almost everything. You cannot use my version to test. But I can give the codes to you if you want to see.

And I do not use any delay in the whole nRFL24L01+ library, And it can transmit million of transactions without any problem. Turning off auto-retransmission is also possible.

And I am confused with your txRxDelay. Why does "16Mhz Arduino" use shorter delay than ARM? Please let me know. Thanks!!

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/TMRh20/RF24/issues/253#issuecomment-217550778

TMRh20 commented 8 years ago

If you are able, could you test the gettingstarted_call_response sketch or basically, the ACK Payload functionality with your code and see if there is any odd behavior?

On Fri, May 6, 2016 at 4:32 PM, TMRh20Projects . tmrh20@gmail.com wrote:

"But from the program flow, the code above does not make sense. right?"

Maybe not? The thing is I added the one delay to prevent ACK payloads from being flushed before they were sent, so I just assumed that the radio would still handle ACKs when not using ACK payloads. In looking through some commits, it looks like that has always been the order of operations, but what your saying does seem to make sense.

I need to test.

"Why does "16Mhz Arduino" use shorter delay than ARM? Please let me know ."

8-bit AVR/Arduino is just slow compared to ARM devices, so a certain amount of delay is inherent in every transaction. On ARM devices like Due, timing of delays, digital IO, etc, should be more accurate relative to the code.

On Fri, May 6, 2016 at 2:31 PM, eos1d3 notifications@github.com wrote:

Please wait more time, I will report later as I am doing a stress testing.

But from the program flow, the code above does not make sense. right?

The ce(LOW) is to switch from RX mode to Standby-I. But ACK package or ACK payload package may still be in TX_FIFO and is waiting for transmission. Leaving RX mode immediately may cause ACK package not be sent.

For normal logic, we need first delay to wait for ACK package to be sent, and a second delay for ACK payload to be sent as it takes longer time. After all these are done, then it is time to left RX mode by lowering CE bit.

So the code is in reverse order.

Currently I am do a ping-pong test with no auto re-transmission at all so any failure of transmission can be found.

From what I found right now, if CE(low) is placed at the first code in stopListening, So many ACK package will not be received. And then I add a few us delay in front of ce(low), then suddenly all ACK package can be received.

But you can do the same like tihs:

  1. Create two nRF24L01+ units.
  2. Unit A send a data package to B
  3. Unit B receives the package, then reply by sending another data package to unit A
  4. after unit A receive the package, restart from 1

Then try the order of ce(Low) and delays, you will find the difference.

I forked from maniacbug, but I changed almost everything. You cannot use my version to test. But I can give the codes to you if you want to see.

And I do not use any delay in the whole nRFL24L01+ library, And it can transmit million of transactions without any problem. Turning off auto-retransmission is also possible.

And I am confused with your txRxDelay. Why does "16Mhz Arduino" use shorter delay than ARM? Please let me know. Thanks!!

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/TMRh20/RF24/issues/253#issuecomment-217550778

eos1d3 commented 8 years ago

From current test, I find this ce(LOW) will stop ACK package being sent frequently.(Using 16MHz STM8L). By adding some delay before ce(LOW) and after radio.dataReady(), almost all ACK are well received.

if (RF24_HAL::interruptsPending()) {    
    if (radio.dataReady()) {
    ce(LOW) ;
    .....

My version is completely different with no compatibility to run the examples. Just read gettingstarted_call_response. I think the code has some problem too. if(!radio.available()){ is not the correct way to test ACK is received. It shall read Status for TX_DS bit. So that example is not reliable to test this case.

 if ( radio.write(&counter,1) ){                         // Send the counter variable to the other radio 
        if(!radio.available()){                             // If nothing in the buffer, we got an ack but it is blank
            Serial.print(F("Got blank response. round-trip delay: "));
            Serial.print(micros()-time);
            Serial.println(F(" microseconds"));     
        }else{      
            while(radio.available() ){                      // If an ack with payload was received
                radio.read( &gotByte, 1 );                  // Read it, and display the response time
                unsigned long timer = micros();

                Serial.print(F("Got response "));
                Serial.print(gotByte);
                Serial.print(F(" round-trip delay: "));
                Serial.print(timer-time);
                Serial.println(F(" microseconds"));
                counter++;                                  // Increment the counter variable
            }
        }

Up to now, the effect of adding delay BEFORE ce(LOW) shows very positive effect. The effect is almost problem free. And without delay before ce(LOW) when data just arrived, so many ACK are not received.

TMRh20 commented 8 years ago

Well I don't see any initial difference in testing with the placement of ce(LOW) with RPi and Arduino, but the logic seems sound, so please leave this issue open for the time being.

If you started from maniacbugs code, here are a couple things I found:

a: One main thing I modified initially was changing available() to poll the FIFO_STATUS register and RX_EMPTY , which can make things a bit simpler.

b: The delays in this lib are intended to work around minor issues and bugs that will be observed when running a complete range of tests including real-world stress tests & examples on a range of hardware. See past issues for discussion.

c: You might find it helpful to look at the OBSERVE_TX register and the ARC_CNT and PLOS_CNT variables, which display retries and failed payloads, and are really helpful to get an idea of what exactly is going on.

d: More of my ramblings at http://tmrh20.blogspot.ca/2014/03/high-speed-data-transfers-and-wireless.html

eos1d3 commented 8 years ago

Did you turn off auto retransmission with setRetries(0,0) before test? If not, you won't see any difference.

And for maniacbugs code: For item a, do you mean the data sheet says to test Status register and then check again FIFO_STATUS again to see if there are data?

But I check Status register, read the payload and clear the DR flag without using FIFO_STATUS at all. I just don't find problem. So how using FIFO_STATUS is simpler?

I agree using FIFO_STATUS is more reasonable.

And I use extra method to check the IRQ pin status before calling available() as the later involves SPI command while pin check is a lot faster.

PS: Saw you blog address with ".hk". Are you from HK? I ask this because I am HKger. xD

TMRh20 commented 8 years ago

Did you turn off auto retransmission with setRetries(0,0) before test? If not, you won't see any difference.

Yup, I assumed this would make a very small difference, but it sounds like you are expecting me to see something obvious?

Without a code example and detailed explanation of what you are seeing with this library I'm kind of limited.

Maybe you are seeing the transmitter not entering RX mode fast enough? In that case, it was a judgement call to minimize delays in the library as much as possible and rely on ESB or the user to manage the timing. (workaround = delay() after stopListening() )

When designing & improving a general library for public use vs adapting & improving a library for specific purposes there is a very big difference. The delay could be increased, but how many people are going to use ESB with a retry value of 0,0 for example?

So how using FIFO_STATUS is simpler?

The code is almost always simpler if used effectively. See my blog and examples.

Are you from HK? I ask this because I am HKger.

Canada, thats just google geolocation + internationalization

eos1d3 commented 8 years ago

Hello,

Just deleted previous comment as those tests had additional delays caused by OLED display which I was not aware.

Now I am using different nRF24L01+ modules, and also clone of it (SI24R1). And I has some interesting findings. I will update later.

And thank you for pointing out the issue of original available(). I suspected that before but was reluctant to test. Further test shows original code can only detect there is data just once. If there are payload pending in FIFO, it just ignores them. This is the same as what data sheet says. This is a huge problem, thanks!!

TMRh20 commented 8 years ago

No prob, take your time in testing etc, these devices can be a bit difficult sometimes when configuring new hardware, modifying code, testing etc.

eos1d3 commented 8 years ago

Below is the main of the node to start the ping-pong test. I removed some OLED display stuff so that it is easier to read.

I used setReties(1,0) for 250KBPS, and setReties(0,0) for 1M and 2MBPS. (using 0,0 for 250KBPS will cause timeout for any package)

You can see when data is received, there is delay_5us followed immediately. Without the delay, the other side never receives ACK package.

And the delay time is different for different data rate. During testing, the delay of starting node are: @250KBPS, it is 67x5 us @1MBPS, it is 32x5 us @2MBPS, it is 30x5 us

and another node, it is: @250KBPS, it is 87x5 us @1MBPS, it is 43x5 us @2MBPS, it is 36x5 us

And it is interesting that the above sets of delay follow nRF24 modules. If I exchange the modules, I found I need to exchange the delay times too.

Any value smaller that above data will make ping-pong test full of errors. And with the correct delay, I can run them for many hours or even a whole day without causing an error.

And during the test, I accidentally found there is a problem inherited from maniacbug. I will make another issue to discuss it as it is totally not related to this issue.

So, in conclusion, the original sequence must create problem for ACK reception.

void main(void)
{
    radio.setup(NRF24L_DATA_RATE,0, RF24_CRC_8, 5, 3, true, true);
    radio.openWritingPipe(ping_back_pipe, PAYLOAD_SIZE);
    radio.openReadingPipe(2, ping_out_pipe, PAYLOAD_SIZE, true);   

    delay_ms(100);

    radio.setTXMode();
    while (1) {
        // repeat sending  a package until it is acked
        if (radio.write((uint8_t*)txBuffer, 3)) {
            radio.setRXMode();
        }
        else {
            continue;
        }

        // RX: wait until a pakcage is recieved
        while (1) {
            if (RF24_HAL::interruptsPending()) {
                delay_5us(50);              //250K:67, 1M:32, 2M:30
                radio.stopListening();      // with only CE(LOW) inside
                radio.setTXMode();
            }
        }
    }
}

and the other node:

void main(void)
{
    System_Initial();
    const uint8_t ping_out_pipe[] = {2,2,3,4,5};
    const uint8_t ping_back_pipe[] = {1,2,3,4,5};

    radio.setup(NRF24L_DATA_RATE, 0, RF24_CRC_8, 5, 3, true, true);
    radio.openReadingPipe(2, ping_back_pipe, PAYLOAD_SIZE, true);          // can be any pipe
    radio.openWritingPipe(ping_out_pipe, PAYLOAD_SIZE);                  // can be any pipe

    radio.setRXMode();
    while (1)
    {
        if (RF24_HAL::interruptsPending()) {
            // ACK is pending, don't switch to TX mode immediately
            delay_5us(50);              //250K:87, 1M:43, 2M:36
            radio.stopListening();      // with only CE(LOW) inside
            radio.setTXMode();          

            // receive data
            if (radio.dataReady()) {
                if (radio.write((uint8_t*)txBuffer, 3)) 
                    radio.setRXMode();              
                else 
                    radio.setRXMode();              
            }
        }
    }
}
TMRh20 commented 7 years ago

In all of my testing regarding the placement of ce(LOW); in startListening(), the library appears to perform best with the current code.

The library already incorporates device specific delays as well.