setAutoAck() doesn't seem to work as expected

nRF24 / RF24

OSI Layer 2 driver for nRF24L01 on Arduino & Raspberry Pi/Linux Devices

https://nrf24.github.io/RF24

GNU General Public License v2.0

2.19k stars 1.01k forks source link

setAutoAck() doesn't seem to work as expected #649

Closed alandsidel closed 3 years ago

alandsidel commented 3 years ago

Having issues with ACK not working correctly? Please see common issues.

I did look here and while the information is interesting, it doesn't seem relevant to me.

Describe the bug Disabling AutoAck seems to disable all communications. I have a bunch of modules, they are authentic as far as I can tell, and I have successful communication between multiple Arduino's as well as between Arduino's and Raspberry Pi's using this library.

I am having some intermittent issues that I am trying to debug. In order to do so I decided to set up another device intended only to listen to the "conversation" between the other two. In all devices I switched the multicast flag in calls to write() to true, and and also called setAutoAck() false on the pipes I was using.

As soon as I do this, no more packets are received by any device. Turning auto ack back on with setAutoAck(x, true) results in data once again being received.

This should be reproducible with example code by simply turning AutoAck() off and setting the multicast flag to true.

Is there some order that needs adhered to regarding this method call vs. e.g. openReadingPipe() or something else? I'm trying to fiddle with the settings to see if that is the case but so far I haven't discovered anything.

2bndy5 commented 3 years ago

I haven't tried to reproduce this yet, but I feel the need for some clarification.

I never liked the naming of the multicast parameter (it requires more familiarity with the RF24Network library than with the nRF24L01 datasheet). Essentially setting the multicast parameter to true disables automatic acknowledgement (auto-ack for short) but only for the payload passed to the buf parameter (in the same call to write() or writeFast()). Therefore you shouldn't need to use setAutoAck() at all if every payload you write() or writeFast() is marked with the NO_ACK flag (when multicast parameter is set to true)

setAutoAck() is mostly for RX nodes with the exception that pipe 0 is also used for TX operations (especially concerning ACK packets with or without ACK payloads attached). Use setRetries(<some integer less than 16>, 0) on TX nodes to disable waiting for an auto-ack from RX nodes (or set the multicast to true as described above). See also reUseTX() when payload is not received while auto-ack is enabled (and not using the multicast parameter).

Is there some order that needs adhered to regarding this method call vs. e.g. openReadingPipe() or something else?

Concerning pipe 0: openReadingPipe() should be called before openWritingPipe() as pipe 0 gets appropriated by openWritingPipe(). However, startListening() re-appropriates pipe 0 properly if openReadingPipe() is called before openWritingPipe(). See also #496 for the flaw in this algorithm (tl;dr don't use integer 0 for the first byte written to the address on pipe 0)

I decided to set up another device intended only to listen to the "conversation" between the other two.

This is exactly what the multicast parameter is for. Remember that all RX & TX nodes need to have matching configuration concerning data rate, channel, CRC, dynamic payloads, payload length (if not using dynamic payloads), and auto-ack (if not using the multicast parameter). As for the term "conversation", no ACK packets are transmit when multicast parameter is set to true. So your third-party listening node will only pick up on normal TX payloads sent between the other nodes (when using the multicast parameter as true). Notice that the success rate of a payload being transmit with auto-ack feature disabled (even if temporarily disabled by multicast parameter) is less guaranteed.

Note that write() and writeFast() will always return true when multicast parameter is set to true. This is because the auto re-transmit feature is ignored, and the only thing left to report is if the singular transmission attempt is complete (not describing if the transmission was received). This behavior is not noted in the docs.

alandsidel commented 3 years ago

Thanks for all that information. The fact that I don't need to mess with setAutoAck() is encouraging since that is the setting that seems to be preventing me from debugging further; I can leave that alone and set the "multicast" flag to true as needed, but this brings up an interesting question.

It would be much easier to debug the issues I'm having unrelated to this ticket if I could setup nodes "Alice" and "Bob" as normal -- that is leaving auto-ack enabled, and setting the multicast flag to false in all calls to write*(). In this situation I'd like the third device "Charlie" to, for example, listen on the addresses of both Alice and Bob, with auto-ack disabled.

Barring issues with interference, range, propagation and so on, is it safe to assume that this should work, and that Charlie can be expected to see all the traffic and not interfere with it? Ideally this is what I'd like to do in order to keep the ack system in place for reliable data transfer between A and B.

2bndy5 commented 3 years ago

It would be much easier to debug the issues I'm having unrelated to this ticket if I could setup nodes "Alice" and "Bob" as normal -- that is leaving auto-ack enabled, and setting the multicast flag to false in all calls to write*(). In this situation I'd like the third device "Charlie" to, for example, listen on the addresses of both Alice and Bob, with auto-ack disabled. ... is it safe to assume that this should work

Theoretically, yes (please report your testing results). But it feels like (to me) you'd be abusing the ESB protocol (underlying mechanism in the nRF24L01 firmware). Technically your debugging tactic is the fundamental concept behind "mouse-jacking" or "keystroke-injection" which are both traditionally prevented (or at least attempted to prevent) via data encryption and/or specific data structuring (e.g. Logitech Unifying USB dongle -- which originally used a nRF24 transceiver). I recommend using getARC() to find out how many retry attempts were made on the last transmission. If the ARC > 0, then you need auto-ack for reliability, and it probably means there is too much interference on that channel for auto-ack to be disabled. FYI, BLE & WiFi use "channel hopping" to avoid this problem, but their use of "channel hopping" might also be the source of your interference.

all RX & TX nodes need to have matching configuration concerning data rate, channel, CRC, dynamic payloads, payload length (if not using dynamic payloads), and auto-ack (if not using the multicast parameter).

this info has been very essential in my experience. But I haven't tried to do your debugging tactic. My doubts about your debugging tactic rest on the configuration of the auto-ack feature

alandsidel commented 3 years ago

Well I'm not trying to "abuse" anything, I just want to "sniff" the data going between the two devices without having to alter the code on either one of them -- as altering the code defeats the purpose of trying to test the existing code. The issue here is that I want to leave autoAck enabled, as that is how I ultimately want to use the radio -- but If I leave it enabled on the "sniffer", then it will start to send ACKs to packets that the real destination never received. This is fundamentally how e.g. a network sniffer works. You don't go to the source or destination programs and alter them, you just sit between them and watch.

In any case, I had never checked the result of write() before. I just started doing so to see if the data I receive matches what write() says with auto-ACK disabled, and it does not. Every call to write() returns false, regardless of me having set the multicast flag to true or false when calling write(), and I can confirm the data is being received by the other side... hmm.

Does the ACK generated by a radio upon receipt automatically go back to the source address that sent it, or does it go to the address configured on that particular pipe? I have assumed that if Alice is 0x01010101 and Bob is 0x02020202, when Bob sends a packet to Alice, Alice will ACK to 0x02020202 regardless of the address configured on any of her pipes.

2bndy5 commented 3 years ago

Every call to write() returns false, regardless of me having set the multicast flag to true or false when calling write(), and I can confirm the data is being received by the other side... hmm.

This is a commonly reported problem. I see in the OP you read the common issues note about stabilizing the power to the nRF24L01. Often this is a power problem, especially for the nRF24L01+PA+LNA modules which almost always require a separate power supply to source enough current for TX operations (including ACK packets). It might also help to adjust the wait time for ACK packets via the delay (the first) parameter to setRetries(). This library defaults to 5 which translates to (5 * 250 + 250 = ) 1500 microseconds (a value that should be sufficient according to the datasheet) Do your RX nodes ever write() successfully in your code? HINT: Another power saving "hack" is to lower the PA level; the library defaults to 0 dBm while the examples use -12 dBm (-18 dBm is the lowest PA level that the nRF24L01 can go).

Does the ACK generated by a radio upon receipt automatically go back to the source address that sent it, or does it go to the address configured on that particular pipe?

ACK packets are sent on pipe 0 to the same address that instigated them (despite what pipe received the instigating transmission). ACK packets are mostly handled by the ESB protocol, so it can't be an issue with the library (as long as auto-ack is enabled on RX nodes -- this is the library default). The nRF24L01 datasheet outlines proper behavior in Appendix A (first page for TX operations and second page for RX operations). Notice that openWritingPipe() implements the proper behavior as outlined for TX nodes. openReadingPipe() also implements the proper behavior for RX nodes. For a more illustrated example, see the diagrams in section 7.7 (notice figure 10 does not use auto-ack while figure 12 does use auto-ack). It helped me (a lot) to think of the addresses as routes rather than destinations.

I have assumed that if Alice is 0x01010101 and Bob is 0x02020202, when Bob sends a packet to Alice, Alice will ACK to 0x02020202 regardless of the address configured on any of her pipes.

yes, you assumed correctly

I didn't mean to badmouth your debug tactic. I just meant that you're dancing on the edge of "black hat" territory. Remember the euphemism "the road to hell is paved with good intentions." 😄 Notice that section 7.7 never outlines an example of the nRF24L01 TX-ing to multiple RX nodes, rather it only details RX-ing from multiple TX nodes.

alandsidel commented 3 years ago

Yes, power has been a constant source of trouble with these modules, no doubt. I have tried all manner of different sized capacitors soldered directly to the modules. 100nF, 47uF, 100uF, combinations of those together, and even a 470uF w/ a current limiting resistor (I don't want the inrush current to kill the regulator). The current the regulator can supply is limited though so today I'll be trying to put together a small buck converter to drop the voltage down from the 5v supply since it can source a much higher current. I've tried every PA level as well, and every transmission speed. Nothing seemed to matter here except that I put some value of filter cap > 100nF on the power pins.

There is noise on the power line whenever the SPI clock is running that varies by about 0.5v peak to peak. This has been bothering me since I discovered it but the values are still within tolerance. No value of filter caps seems to work. A discussion on the page you referenced mentions using 1300 or 1500 uF caps -- I don't have any laying around nearly that large to try, maybe about 1000uF max. When I didn't see any difference between 100uF and 470uF I didn't think going even larger was going to be worthwhile.

It might also help to adjust the wait time for ACK packets via the delay (the first) parameter to setRetries()

I wanted to try this as well, thanks for reminding me.

In any case the initial issue that prompted me to open the ticket still remains; After calling setAutoAck(x, false); on pipes 0 and/or 1 (the two I am using) I no longer receive any data. The value of the "multicast" flag in my call to write() doesn't seem to matter.

Oh as for this:

Do your RX nodes ever write() successfully in your code?

There is generally only 1 RX node (ignoring the sniffer) and he does not ever write()

alandsidel commented 3 years ago

Oh I'd like to point out in case you missed it, that write() returns false even when I call it with multicast set to true. This is counter to the library documentation and reduces confidence that it will return the correct value when multicast is set to false. I need to go dig into the library source to see what's actually going on here.

2bndy5 commented 3 years ago

After calling setAutoAck(x, false); on pipes 0 and/or 1 (the two I am using) I no longer receive any data.

To be very clear, are you calling this function in both RX & TX nodes? Remember what I said about the need for matching configuration.

write() returns false even when I call it with multicast set to true

Indeed this is puzzling. I did some digging of my own and found that write() can return false in 1 of 2 conditions:

If the TX_DS flag of the STATUS register is not asserted after 95 milliseconds, but it should be asserted if the NO_ACK flag is set according to the TX operational flowchart in the datasheet. If this is what is triggered in your case, you should also be getting an error message saying "RF24 HARDWARE FAIL: Radio not responding, verify pin connections, wiring, etc."
- The error message mentioned (produced by RF24::errNotify()) will only show on platforms that support printf(). In the OP you named the platforms in question as "Arduino" & "Raspberry Pi" which both support printf(). Although, you may have to add some code (see code snippet in docs about using printDetails()) to enable usage of the printf() function.
If the MAX_RT flag of the STATUS register is asserted, but this should never happen if the NO_ACK flag is set according to the TX operational flowchart in the datasheet. Although, I have a feeling this is happening for you, given the behavior you described.

The datasheet never mentions having to reassert the EN_DYN_ACK flag in the FEATURE register after using the W_TX_PAYLOAD_NOACK command (done with the multicast parameter)~, but I'm wondering if that is actually the case~. If MAX_RT flag is asserted, it probably means the NO_ACK flag is being ignored on the TX node (which can only happen if the EN_DYN_ACK flag is reset to its default 0 state).

See TX operational flowchart in the datasheet and anywhere it mentions the NO_ACK flag in the datasheet.

~I vaguely remember seeing this behavior in my recent debugging tests for my CircuitPython library. If I can reproduce this using that library, then this behavior may be a lack of documentation in the datasheet (something that is unfortunately not uncommon).~

2bndy5 commented 3 years ago

I can't believe I haven't asked this yet. Are you using the nRF24L01+PA+LNA module (as seen here)? Because those require much more power than the Arduino's & RPi's voltage regulator(s) can output. The unofficial nRF24L01 breakout boards use a AMS1117 regulator that boasts a 800 mA current limit. I documented my test results for this specific type of nRF24L01 module in my circuitpython library's troubleshooting page.

I don't want the inrush current to kill the regulator

Typically nRF24L01 boards don't have a voltage regulator, unless you bought the module(s) from SparkFun. The inrush current is a valid concern, but I've only ever seen the importance of a current limiting resistor (usually less than 100 ohms) implemented for motor driver circuits. The inrush current is a problem that should really be addressed when using inductors (due to Lenz's Law), but this is getting off topic.

No value of filter caps seems to work

Capacitors will act as "DC blockers" which is what provides the power stability (but only on the VCC line), and the capacitance values depend on the voltage regulator's model that is used on the MCU board (best to look up the applicable datasheet for the MCU's linear regulator). I've read on many MySensor forums a recommended 100 uF as a broad suggestion, and most of which also recommend an additional 0.1 uF for added stability. BTW 0.1 uF is a common requirement among linear regulator datasheets (between V_in and GND). Personally, I only ever used 100 uF capacitors (1 for each transceiver) with great success (I don't own any capacitors less than 1 uF ☹️).

There is noise on the power line whenever the SPI clock is running that varies by about 0.5v peak to peak

This tells me you have an oscilloscope; 😮 I'm VERY jealous. I wouldn't worry to much about this ripple because (as you noted) it is within TLL tolerance.

A discussion on the page you referenced mentions using 1300 or 1500 uF caps

I don't recall referencing a page with such a specific discussion in this thread. I am curious to (re-)read it though. Could this actually be the issues referenced in this repo's COMMON_ISSUES.md?

Avamander commented 3 years ago

There is noise on the power line whenever the SPI clock is running that varies by about 0.5v peak to peak

I am pretty sure this is enough to cause issues. I'd solve it first before doing other things.

alandsidel commented 3 years ago

Sorry for the delay for all involved. I've gone through nearly all of the Nanos I have on hand to try to find the one with the least noisy 3v3 regulator. The regulator on these devices is, according to what I've read, supplied by the USB UART module. They are limited to 30-40mA of power, which is small, but well over twice the 12mA the datasheet for the RF24 lists as a maximum current draw.

All testing in this post was done with a single 10uF cap on the modules VCC.

@2bndy5

To be very clear, are you calling this function in both RX & TX nodes? Remember what I said about the need for matching configuration.

I wasn't, I am now, but it hasn't made a difference. I am continuing to debug. To be honest, if I can't have auto-ack on on some modules and off on others in a real world environment, and have them communicate with each other, I think I'll hav to just ditch these and find something else. That's an unacceptable restriction for my use case. I'm proceeding under the impression that this is just a troubleshooting step and once the issues are ironed out, I can start adjusting those settings to ensure that doing so does not cause a communications failure.

Indeed this is puzzling. I did some digging of my own and found that write() can return false in 1 of 2 conditions:

I found the same information, yet, write() is always returning false. Either every module I have is defective, or there's a problem in the library.

Keep in mind that even when write() returns false, the data is successfully transmitted and received -- most of the time.

I can't believe I haven't asked this yet. Are you using the nRF24L01+PA+LNA module

Nope, I would have mentioned that. I'm using the modules with the integrated PCB trace antenna.

The inrush current is a valid concern, but I've only ever seen the importance of a current limiting resistor (usually less than 100 ohms) implemented for motor driver circuits.

I'm sourcing the 3.3v from the Arduino Nano connected to the radio, and it has a maximum current supply capability of around 30-40mA. A big filter capacitor on the 3.3v line could easily draw more than that when it's initially charging. That's what the current limiting cap is for. It's only in place on the radio module with the 470uF cap, the others are directly connected, as they should charge quickly before the current draw can rise high enough for long enough to damage the Arduino.

I believe I've already damaged one of the regulators this way, or via insufficient ESD protection when handling the Arduino. One of my modules is much noisier than the others on the 3.3v power.

This tells me you have an oscilloscope; 😮 I'm VERY jealous

I do, an inexpensive USB scope (Link Electronics MSO 19.2) that I've had for several years. It's not perfect, but it works well for hobbyest stuff like this. They are about 250 USD new.

I wouldn't worry to much about this ripple because (as you noted) it is within TLL tolerance.

It is, BUT, it's enough that it actually triggers the scope's logic analyzer at times on other nearby pins, like MISO/MOSI, so I've been trying to determine if that's just the scope being overly sensitive or if it's potentially affecting the radio/Arduino as well.

Could this actually be the issues referenced in this repo's COMMON_ISSUES.md?

Yes. That's all that the common issues page has -- a link to two issues, both of which have discussions surrounding massive caps. 1000uF, 1500uF, 3300uF.

@Avamander

I am pretty sure this is enough to cause issues. I'd solve it first before doing other things.

I've thought so too. I have gone through all of my Arduino's (well 7 of them) and found 1 that appears to have the SPI circuitry entirely burned out (no clock signal at all), one with a really noisy 3.3v power regulator (possibly damaged), and 5 that seem OK and have normal looking low noise. I've uploaded two images, one with the noisy line, one with the "ok" line, you can see them here. Scale is on the left and is unchanged between the two.

https://imgur.com/a/ky4GRLl

alandsidel commented 3 years ago

Slight update here.. if I call setAutoAck(false); in the ~~client~~ TX side, all of the calls to write(...); suddenly start returning true instead of false, regardless of the value of multicast. The calls return true even when there is no RX side running anywhere to possibly ACK the packets.

2bndy5 commented 3 years ago

if I call setAutoAck(false); in the client TX side, all of the calls to write(...); suddenly start returning true instead of false, regardless of the value of multicast. The calls return true even when there is no RX side running anywhere to possibly ACK the packets.

that is the behavior i would expect when auto-ack is off.

alandsidel commented 3 years ago

that is the behavior i would expect when auto-ack is off.

To be honest, I did not expect the auto-ack setting to have any effect on a TX-only device. My understanding has been that the multicast flag should be what determines if the sender expects an ACK or not, and auto-ack should only have an effect on the behavior of the receiver.

write() still returns false whenever auto-ack is enabled on the sender, no matter if multicast is true or false, and no matter what the settings are on the receiver.

2bndy5 commented 3 years ago

just to reiterate, mulitcast is specific to controlling auto-ack on a per packet basis; setAutoAck() pertains to all packets (when multicast is left to its default false value). Multicast controls the packet's NO_ACK flag in the Packet Control Field.

This confusion is born from the datasheet, and exasperated by this library's choice of naming the multicast parameter. (I called it ask_no_ack in my citcuitpython library)

I did not expect the auto-ack setting to have any effect on a TX-only device

The virtual masks (that manipulate the IRQ pin) are used to determine the success of the transmission's attempt (TX_DS) vs the success of the transmission's reception ( !MAX_RT ). MAX_RT is only indicated when ARC is set to a number greater than 0 (via setRetries()). setAutoAck() only affects the TX node about pipe 0 as the nRF24L01's ESB protocol automatically switches pipe 0 to RX mode when waiting for an ACK packet (and then returns pipe 0 to TX mode when done listening). Thus, auto-ack needs to be enabled on pipe 0 of TX nodes when auto-ack on RX nodes is enabled for any pipe addressed to the auto-ack enabled TX node. Please remember that pipe 0 on RX nodes needs auto-ack enabled in order to transmit ACK packets.

The use of auto-ack (and multicast parameter) is explained in more technical detail on step 5 of appendix A's first page.

alandsidel commented 3 years ago

Ok, got it. My brain is smoking a bit after all that, but I understand now. In any case, I have a perfectly "normal" setup right now with auto-ack enabled on both ends, all matching settings, and the retries set to their maximum (15, 15) -- and still, every call to write() returns false, regardless of the value of the multicast flag. All data is successfully received by the RX side.

This is the main issue I'm having now. I cannot reliably "sniff" the traffic as a second RX node using the same address as the first, but with auto-ack disabled, does not indicate reception of any data while the first RX node continues working as normal. When a packet is missed between the TX and RX nodes, I cannot reliably determine if it was due to a problem with the TX node or a problem with the RX node without being able to do that kind of sniffing -- not unless I want to roll my own ACK logic and disable auto-ack system wide anyway, it seems.

2bndy5 commented 3 years ago

still, every call to write() returns false, regardless of the value of the multicast flag. All data is successfully received by the RX side.

its gotta be a power problem. RX nodes aren't sending (or can't send) the ACK packet.

roll my own ACK logic and disable auto-ack system wide anyway

The PingPair_Sleepy example should be a starting point, but all the examples use auto-ack enabled, so you'de have to also roll your own auto-retry logic as well when you disable auto-ack.

I cannot reliably "sniff" the traffic as a second RX node using the same address as the first, but with auto-ack disabled, does not indicate reception of any data while the first RX node continues working as normal.

TBH, I don't think the nRF24L01 was designed to "sniff" traffic between other nodes with a configuration that is in any way different from the sniffer's configuration. I know I said its "theoretically" possible earlier but this behavior you described makes me think I was wrong (clearly that is a common theme for me).

I cannot reliably determine if it was due to a problem with the TX node or a problem with the RX node without being able to do that kind of sniffing

Building off of my last paragraph, it would seem that no one can. getARC() (for TX node after transmission) and testRPD() (for RX node during reception) are your best tools to try and debug this. Using a middle man is a rather complex tactic for debugging, and while WiFi or BT might allow sniffers, the nRF24L01 would have you look at the endpoints' traits instead of their middle traffic.

You've come a long way (in terms of understanding the ESB protocol) in this thread, but I feel like the sniffer technique is an attempt to bypass what the nRF24L01's firmware (namely the ESB protocol) was designed to do. I would stick to @Avamander's suggestion, and continue fixing the power problem first. Be aware, the modules might be a counterfeit or incompatible clone. I would appreciate a store link as there are known retailers that sell clones or counterfeits without providing customers any sort of recourse (BangGood and AliBaba are on that list).

2bndy5 commented 3 years ago

@alandsidel I figured out why write() still was able to return false if multicast parameter was set to true. I had forgotten that the nRF24L01 only allows the use of the NO_ACK flag (& thus the multicast parameter) to be used if the EN_DYN_ACK flag in the FEATURE register is asserted. This library by default doesn't assert this flag; it must be done with enableDynamicAck() prior to using the multicast parameter in write()

This info was hidden away in the docs (use the links that I included).

alandsidel commented 3 years ago

Ok, I've been away for a few days but was able to solder up another test board today and also do some more testing to see if there is in fact a power issue on the RPi. I'm using 3B+'s here which my information says should be able to source up to about 500mA on the 3v3PWR connection, and that is the pin I'm using to power the radio. The radio is plugged into a custom made PCB that serves as both a pi-hat and an arduino shield for the RF24, though not both at the same time obviously.

The power does look "dirty" to me but a little of this may be the ground loop from the probe, and even if not, it does seem to still be within tolerance although only just. This imgur gallery has three pictures. Two are screenshots of the scope interface showing the noise on the 3v3 line when the clock is running, one of these is with a single 470uF cap & current limiting resistor (for inrush) and the other is a 100nF combined with a 47uF and no resistor. They are physically separate RF24 modules with different caps that I've soldered to them. The last image is a photo of how I have the probe leads connected.

I have four different RF24s in use in two "paired" setups with different addresses. Each pair has an RF24 on a PCB with a temperature & humidity sensor & an Arduino Nano which sends reports to a Raspberry Pi 3B+ with it's own RF24. During normal operation there are no transmits from the RPi except for the ACKs -- by which I mean, there are no calls to any write function in my code.

Both RPIs receive the data from the Arduino's without problems, but almost all off the calls to write() from the arduino are returning false. I've only seen four true responses in total over the course of several hundred calls. This is with the HIGH power setting, a 1mbit transmission rate, a call to setRetries(15, 15), multicast set to true and prior to me changing the code to call enableDynamicAck().

In other words, the calls to write still sometimes returned true even when I had not called enableDynamicAck(). That doesn't sound normal based on what you said, but perhaps it is. After calling enableDynamicAck(), the calls to write with multicast set to true do always seem to return true as expected, even when the receiving side is not powered on.

If I switch the calls to write to disable multicast, then the calls to write() start failing as expected. When I turn the receiver back on, those calls to write still continue to return false as before, with perhaps one "true" result in every 50-150 transmissions. However the RPi still receives the data. This is with autoack and dynamicack enabled on both sides; I'm taking pains to ensure the configs are as close to identical as they can be after you brought that up, with the only differences being the addresses assigned to each radio.

2bndy5 commented 3 years ago

enableDynamicAck() is a TX only feature. It only needs to be called in your code if your code uses write() with the multicast parameter set to true. Without calling enableDynamicAck() before calling write(), the multicast parameter cannot temporarily disable auto-ack for that individual payload (meaning the transmission will use whatever the auto-ack feature is set to do).

This still sounds like a power problem. If data is received but not acknowledged (and settings match on both nodes), then it has to be a power problem; the library simply cannot remedy this situation (as long as all settings match on both nodes) because ACK packets are constructed and transmitted automatically (provided enough power) by the nRF24L01 firmware -- the library can only turn the auto-ack feature on or off, it cannot alter anything more finite than that.

The breakout boards made for the PA/LNA modules use an AMS1117 that boasts a 800mA current limit. I suspect that much of that 500mA you quoted is consumed internally for built-in peripherals (though I'd like to see where you got that information -- I'm curious about is validity and/or context). I recently found that my samd-based boards (also quoted a max 500mA from regulator -- according to adafruit store page) did not supply enough power to transmit (including ACK packets and normal TX packets -- they could only receive packets), but that was when I was debugging with the PA/LNA modules.

Also try lowering the PA level to LOW (-12 dBm) or MIN (-18 dBm) withsetPALevel(). This seems to save enough current for adequate TX operations (based on my results from aforementioned recent tests with 500mA limiting regulator).

2bndy5 commented 3 years ago

For what it's worth, I've also seen several issues resolve themselves by taking the breakout board out of the equation.

alandsidel commented 3 years ago

@2bndy5 my 500mA number is from here: https://pinout.xyz/pinout/pin1_3v3_power# which states that the pin can provide up to 500mA. Even if 400 or 450 of that were consumed on the board, that should leave plenty, as the RF24 datasheet lists about 12mA as the maximum draw for the device. I've tried previously with MIN power, with no result.

I understand what you're saying regarding the library and will continue investigating the power situation. I'm not using any breakout boards though, maybe you meant the custom PCB? Eliminating that is to be the next test, maybe it needs a redesign.

2bndy5 commented 3 years ago

Yeah I meant the PCB. Thanks for the link to info origin. Did you read this linked article from the description of the 3v pin? It talks about the 3v rail's stability and mentions the SoC consumes 200mA (it has some pictures too🤓).

alandsidel commented 3 years ago

I removed the PCB and wired the RF24 up directly with the shortest pre-made female to female jumpers I have, which are about 15cm end to end. No difference. The calls to write() always return false, but the data is always received without problem on the Pi.

The link is interesting but again, even if it consumes 400mA, the 3B+ is supposed to be good for 500mA or so, and the RF24's datasheet says it consumes no more than 13mA (I believe it's 12.1 or 12.2). There is more than enough power. What I've seen in my tests has been that power looking dirty -- there is noise on the 3.3v line whenever the SPI clock is running. However that noise should not be enough to interfere with the operation of the RF24.

Further if I do transmit from the Pi, the Arduino receives the transmissions just fine. There are no transmissions from the Pi during normal operations but I do have a "pairing" button on each of the devices. When pressed on the Pi, it starts broadcasting it's own address and channel every 100uS or so. When the pairing button is pressed on the Arduino, it starts searching through all of the channels listening for this beacon until it finds it.

This is a near continuous "beacon" from the Pi that transmits every 100uS for 60 seconds, and it works perfectly. This is proof to me at least that the RF24 on the pi is capable of transmitting without issue.

Switching the radio on the pi from MAX power to LOW and MIN likewise had no impact. The calls to write on the arduino, with autoack enabled, dynack enabled, and multicast false always return false while the Pi is receiving the data and not giving any indication of having missed any data or received any duplicates. The debug output on the arduino serial console I put in when it transmits matches what is being received on the pi.

There must be some other explanation for why write is succeeding while reporting that it failed...

2bndy5 commented 3 years ago

I can tell your getting frustrated, but there is only one explanation: ACK packets aren't getting through from RPi to Arduino. This answer comes from understanding the meaning of MAX_RT & TX_DS flags in the STATUS register as that is the information returned from write().

Try switching out the nRF24L01 module on the RPi with a module that you know can transmit ACK packets (i.e. swap the modules from arduino and RPi). You may be looking at a counterfeit.

Avamander commented 3 years ago

However that noise should not be enough to interfere with the operation of the RF24.

I have tried all manner of different sized capacitors soldered directly to the modules. 100nF, 47uF, 100uF, combinations of those together.

I am curious how that noise has remained. You should have enough capacitance to smooth most noise out assuming the internal resistance of the capacitors isn't massive. Are they of good quality?

alandsidel commented 3 years ago

@2bndy5 yes, I am a little frustrated, trying to not let it out. These modules started off very promising but have been one fight after another to get working properly. To be honest the modules seem to be working fine right now! I send a transmit, and it arrives, pretty much every time. The only issue I'm having at present is that write() is returning false.

As a software developer, I'm often in your shoes so I'm trying to sympathize, but the reality is there could be some sort of bug in the library as well, something wrong with these modules in particular, or some other issue I haven't thought of yet. Being that more than one let's say "poorly documented" behavior has been discovered in the library during this adventure, I am not willing to rule out there being more. There can certainly be a number of things here to blame than simply the receiving side lacking enough power, or clean enough power, to send an ACK.

I would expect that, if there really were just an ACK transmission error, that some or all of the retransmits from the Arduino side would show up at the Pi. Meaning, after I've told it to retry 15 times via setRetries(), I would expect to see a few of those retries hit the Pi if it believes it's seen and ACK'd one of the earlier ones. Do you understand what I'm saying here or should I explain it in more detail?

I've tried 7 different Arduino's and 4 different Pi's (2 3B+s and 2 Zero Ws), and have tried all 10 different RF24 modules I have. The RF24s have a variety of filter caps soldered to them from nothing, to a single 100nF, to 47uF, and combinations up to 470uF. I have two hand-soldered prototype PCBs for connecting the Pi to the Hat, and three or four of the custom made PCBs -- I had 10 made by JLCPCB but haven't bothered soldering the headers to them yet. I've tried using straight wired connections as requested.

The symptom I'm experiencing is the same no matter what changes I make to my software or hardware. You can perhaps understand why, at this point, I'm growing a little more suspicious of it perhaps being a library issue, maybe one more undocumented (or poorly documented) behavior I'm not aware of to do with the order in which I'm calling different methods or something similar.

Upon power up of the Pi I do this:

void resetRadio() {
  radio.stopListening();
  radio.setAddressWidth(4);
  radio.setPALevel(RF_PLVL);
  radio.setChannel(config.channel);
  if (enableDynAck) {
    radio.enableDynamicAck();
  }
  radio.setAutoAck(autoAck);
  radio.setRetries(15, 15);
  radio.setDataRate(RF_RATE);
  radio.openReadingPipe(1, config.localAddress);
  radio.openWritingPipe(config.brainAddress);
  radio.startListening();
  radio.printDetails();
}

It's basically identical on the Arduino with just the variable names changed. Maybe something here is obviously wrong to someone else's eyes.

@Avamander The caps are brand new, purchased in a hobbyist kit, from a reputable seller on US Amazon. I have ceramics from 10pF to 100nF and electrolytics from 100nF to 1000uF. When I test them with the cap testing function on my multimeter they appear to be fine. I just pulled out one of the 25v/10uF electrolytics out of the box, and it tested as 10.8 on my meter with me holding the leads to it with my fingers. That's as much as I can say about their quality, but I haven't had problems with them in any other circuits.

Avamander commented 3 years ago

The caps are brand new, purchased in a hobbyist kit, from a reputable seller on US Amazon. I have ceramics from 10pF to 100nF and electrolytics from 100nF to 1000uF.

That's good.

The modules themselves though, they're from a reputable vendor as well? Clones certainly have some infuriating issues, being unable to use auto-ack has certainly been reported before.

In order to rule out issues with the RPi-Arduino combination, any chance you could borrow an another Arduino and run the pingpair-ack example on both?

TMRh20 commented 3 years ago

Just piping in here, haven’t read the whole thread but I’ve found it necessary to modify the radio hardware to get good results with mixed modules and devices.

With regular modules I’ve been attaching an external antenna (a 3-4 inch wire) and have been using external power modules per

https://tmrh20.blogspot.com/2019/?m=1

I use a wide assortment of devices with very high reliability etc

On Oct 26, 2020, at 3:27 PM, Avamander notifications@github.com wrote:

The caps are brand new, purchased in a hobbyist kit, from a reputable seller on US Amazon. I have ceramics from 10pF to 100nF and electrolytics from 100nF to 1000uF.

That's good.

The modules themselves though, they're from a reputable vendor as well? Clones certainly have some infuriating issues, being unable to use auto-ack has certainly been reported before.

In order to rule out issues with the RPi-Arduino combination, any chance you could borrow an another Arduino and run the pingpair-ack example on both?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

2bndy5 commented 3 years ago

The following shouldn't matter because you're using pipe 1 for RX-ing, but typically you want to open a writing pipe just before it is needed (or before stopListening()). openWritingPipe() appropriates the RX pipe 0 address for ACK packets during TX mode. I've made a note of this lack of documented behavior in #671's suggested solution. Notice also the "What was the original intention" section of that issue talks about how ACK packets get their address (with links to the datasheet).

@alandsidel I have been hunting for library errors and lacking documentation for the last month or so (@Avamander and/or @TMRh20 is probably pretty annoyed with me on this front). If you find something that was missed, I'd be happy to look into it, but so far I can't find anything in the library's source code that could instigate this extremely frustrating behavior. BTW, thank you for showing us your source code.

2bndy5 commented 3 years ago

To expand on @TMRh20 suggestion about antenna modification, see also this instructable. I have not tested this technique myself, but its theory is rather sound.

alandsidel commented 3 years ago

@Avamander

The modules themselves though, they're from a reputable vendor as well?

Makerfire brand, 4.5 stars on Amazon. A pack of 10 was about $12. I don't know if this is a well known brand or not, first I'd heard of it.

In order to rule out issues with the RPi-Arduino combination, any chance you could borrow an another Arduino and run the pingpair-ack example on both?

I put two more nanos on a pair of solderless breadboards and connected the radios via jumper wires to breakout PCBs, also bought from makerfire. The breakouts are extremely simple, just a 2x4 female header on a small pcb that has two rows of 4 male pins spaced to plug into a breadboard across the central bridge. I did mess up and download the "dynamic" demo rather than the "ack" demo, but the auto-ack isn't disabled in the code so it should be enabled by default? The demo did not actually check the result of the call to write() so I added a simple check myself to print a failure message, and put a 1s delay at the top of the loop.

Most of the time the write succeeds indicating an ack was sent, but it still fails at times:

11:56:09.849 -> Now sending length 16
11:56:09.849 -> Got response size=16 value=ABCDEFGHIJKLMNOP
11:56:10.944 -> Now sending length 17
11:56:10.979 -> Write failed!
11:56:10.979 -> Got response size=17 value=ABCDEFGHIJKLMNOPQ
11:56:12.107 -> Now sending length 18
11:56:12.107 -> Got response size=18 value=ABCDEFGHIJKLMNOPQR
11:56:13.195 -> Now sending length 19
11:56:13.229 -> Write failed!
11:56:13.229 -> Got response size=19 value=ABCDEFGHIJKLMNOPQRS
11:56:14.343 -> Now sending length 20
11:56:14.396 -> Got response size=20 value=ABCDEFGHIJKLMNOPQRST
11:56:15.460 -> Now sending length 21
11:56:15.494 -> Got response size=21 value=ABCDEFGHIJKLMNOPQRSTU
11:56:16.602 -> Now sending length 22
11:56:16.602 -> Write failed!
11:56:16.602 -> Got response size=22 value=ABCDEFGHIJKLMNOPQRSTUV

This is "better" but still not "good". This indicates to me that my PCBs are at least somewhat to blame, but that can't be the entire story, as there are still write "failures." The behavior is the same as I see in my own code -- the write returns a failure, but the data is still actually sent and received just fine.

As a final test later today I will remove the breakout boards and connect the jumper wires directly to the radio pins.

@TMRh20

With regular modules I’ve been attaching an external antenna (a 3-4 inch wire) and have been using external power modules

I'll look into that if I can't resolve this through other means, thanks.

@2bndy5

The thing you mentioned here with the order and pipe is exactly the sort of thing I'm talking about, thanks for your efforts there. I realize that much of this is documented in the datasheet as well if you go digging through it, but I don't think that's a reason to not put it in the library documentation as well. Similarly the behavior you found regarding write returning false when multicast is true, if dynamic ack is not enabled on the sending node, should also be in the library documentation IMHO.

Avamander commented 3 years ago

This indicates to me that my PCBs are at least somewhat to blame, but that can't be the entire story, as there are still write "failures."

I'd really rather suspect the radios themselves because you've eliminated potential power issues with ceramic+electrolytic capacitors on the nRF modules.

One last possible test you can do is trying different power levels, does it work worse/better with MIN/MAX?

2bndy5 commented 3 years ago

@Avamander I already suggested that, and @alandsidel reported no difference.

@alandsidel #661 I'm all over that. There's no reason that flag shouldn't be asserted from begin() and forgotten about afterward. I've also gone through and clarified docs about write() (and write-related functions) return value, why auto-ack feature should stay enabled (8 paragraphs -- 4 for all pipes and 4 for individual pipes), and much more. I even made new examples ripe with comments that give newcomers a lot to digest. Stay tuned for the next release (or you can use my fork's "formatting-examples" branch)

Never heard of Makerfire either, but it sounds like another chinese impersonator brand (they often use the word "fire" -- I think it comes from poor brand name translation). Personally, I try to avoid buying 18650 batteries from brands that use the word fire (is that not an obvious red flag to anyone else?) in it.

2bndy5 commented 3 years ago

I think I may have reproduced this behavior inadvertently. I've been testing the new "acknowledgementPayloads.cpp" example on Linux (RPi4), and I get write() returning false when it clearly has sent something. I suspected the radio hardware was malfunctioning, but then I ran my CircuitPython library's ack payload test on the same RPi with the same radio. No more false-positives. I think the millis() may be so inaccurate on Linux that 95 ms isn't a good timeout sentinel for Linux. My CircuitPython library does not try to manually detect timeouts, rather it assumes the SPI bus is stable and the ESB protocol will do its job. I suspect millis() because ACK payloads (8 bytes in my case) take a little longer to transmit, and I have been getting the "RF24 HARDWARE FAIL: Radio not responding, verify pin connections, wiring, etc" error contrary to @alandsidel experience (assuming @alandsidel did not comment out #define FAILURE_HANDLING before installing the library).

TMRh20 commented 3 years ago

I’m pretty sure there are still issues with the Linux millis() implementation

On Nov 2, 2020, at 2:03 PM, Brendan notifications@github.com wrote:

I think I may have reproduced this behavior inadvertently. I've been testing the new "acknowledgementPayloads.cpp" example on Linux, and I get write returning false when it clearly has sent something. I suspected the radio hardware was malfunctioning, but then I ran my CircuitPython library's ack payload test on the same RPi with the same radio. No more false-positives. I think the millis() may be so inaccurate on Linux that 95 ms isn't a good timeout sentinel for Linux. My CircuitPython library does not try to manually detect timeouts, rather it assumes the SPI bus is stable and the ESB will do its job.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

2bndy5 commented 3 years ago

everything I've read about measuring milliseconds (& and more-so for microseconds) in Linux warns about inaccuracy of the implementations. This seems to have something to do with different oscillators for different CPUs combined with the timely cost of computing the measurements... I'm spinning up my RPi2 to see if the problem is less prevalent than with the RPi4.

wiringPi official reference

time.h reference (man page) <- read the "Note for SMP systems" section

TMRh20 commented 3 years ago

The problem seems to be not so much with accuracy but in the values “jumping” around. Not sure if a coding issue or wtf.

On Nov 2, 2020, at 11:01 PM, Brendan notifications@github.com wrote:

everything I've read about measuring milliseconds (& and more-so for microseconds) in Linux warns about inaccuracy of the implementations. This seems to have something to do with different oscillators for different CPUs combined with the timely cost of computing the measurements... I'm spinning up my RPi2 to see if the problem is less prevalent than with the RPi4.

wiringPi official reference

time.h reference (man page) <- read the "Note for SMP systems" section

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

2bndy5 commented 3 years ago

Closing this since it's likely that the problem was caused by a Chinese clone.

2bndy5 commented 3 years ago

@alandsidel I recently happened across this article about a missing capacitor on cheap clones. I was wondering if it would be useful (or even applicable) to your case

alandsidel commented 3 years ago

@alandsidel I recently happened across this article about a missing capacitor on cheap clones. I was wondering if it would be useful (or even applicable) to your case

Thanks Brendan. That cap is indeed missing from the board on all the modules I have, go figure. I don't have any 1pF (or any value) SMT caps laying around to test with, but I do have some 10pF ceramics I could try next time I'm messing with these modules.

For now, thanks to how prolific cheaply made boards for these are, as well as the difficulty in determining if the IC itself is counterfeit or not, I decided to redesign the project around an ESP32-WROOM rather than Arduino + nRF24. The ESP32 modules don't seem to be suffering from the same issues with counterfeiting.