Use Linux kernel's character device API to implement IRQ capability

2bndy5 commented 8 months ago

This guide does mention how IRQ could be done using some poll() function (which is unknown to me), but it uses a blocking example.

The main obstacle (for me) is using a separate processor thread to poll the GPIO for IRQ edge detection.

Originally posted by @2bndy5 in https://github.com/nRF24/RF24/issues/942#issuecomment-1950890771

Useful resources

TMRh20 commented 8 months ago

FYI just tested the interrupt example and pigpio doesn't seem to work on the RPi5 yet.

+---------------------------------------------------------+
|Sorry, this system does not appear to be a raspberry pi. |
|aborting.                                                |
+---------------------------------------------------------+

Now you have me playing with pthreads again... this is going to take some time. I have a simple example triggering an IRQ using pins 22 or others that are toggled manually, but can't get it to trigger on the radio IRQ pin for some reason. Wondering if it needs to be set to an input or something first? The guide doesn't seem to do that...

Also wondering if I should continue, or if this is something you wanted to take on?

2bndy5 commented 8 months ago

I could take a stab at it, but I'll need you to verify RPi5 compatibility.

2bndy5 commented 8 months ago

Still researching pthread... Looks like the function passed to pthread_create() only executes once in a separate thread -- using default scheduling and thread attrs, I think -- then the thread terminates upon exit of the passed function.

Clearly, I've never done multi-threaded processing in C++. Some of my input here might be obvious to those with experience using pthread.

2bndy5 commented 8 months ago

Ok, I'm booted into Ubuntu and inspecting /usr/include/linux/gpio.h. And I just found out that there is a v2 API in gpio.h. Some of the structs used in #942 are actually deprecated already 😡. For example: gpiohandle_request is defined as

/**
 * struct gpiohandle_request - Information about a GPIO handle request
 * @lineoffsets: an array of desired lines, specified by offset index for the
 * associated GPIO device
 * @flags: desired flags for the desired GPIO lines, such as
 * %GPIOHANDLE_REQUEST_OUTPUT, %GPIOHANDLE_REQUEST_ACTIVE_LOW etc, added
 * together. Note that even if multiple lines are requested, the same flags
 * must be applicable to all of them, if you want lines with individual
 * flags set, request them one by one. It is possible to select
 * a batch of input or output lines, but they must all have the same
 * characteristics, i.e. all inputs or all outputs, all active low etc
 * @default_values: if the %GPIOHANDLE_REQUEST_OUTPUT is set for a requested
 * line, this specifies the default output value, should be 0 (low) or
 * 1 (high), anything else than 0 or 1 will be interpreted as 1 (high)
 * @consumer_label: a desired consumer label for the selected GPIO line(s)
 * such as "my-bitbanged-relay"
 * @lines: number of lines requested in this request, i.e. the number of
 * valid fields in the above arrays, set to 1 to request a single line
 * @fd: if successful this field will contain a valid anonymous file handle
 * after a %GPIO_GET_LINEHANDLE_IOCTL operation, zero or negative value
 * means error
 *
 * Note: This struct is part of ABI v1 and is deprecated.
 * Use &struct gpio_v2_line_request instead.
 */
struct gpiohandle_request {
    __u32 lineoffsets[GPIOHANDLES_MAX];
    __u32 flags;
    __u8 default_values[GPIOHANDLES_MAX];
    char consumer_label[GPIO_MAX_NAME_SIZE];
    __u32 lines;
    int fd;
};

and gpio_v2_line_request is defined as

/**
 * struct gpio_v2_line_request - Information about a request for GPIO lines
 * @offsets: an array of desired lines, specified by offset index for the
 * associated GPIO chip
 * @consumer: a desired consumer label for the selected GPIO lines such as
 * "my-bitbanged-relay"
 * @config: requested configuration for the lines.
 * @num_lines: number of lines requested in this request, i.e. the number
 * of valid fields in the %GPIO_V2_LINES_MAX sized arrays, set to 1 to
 * request a single line
 * @event_buffer_size: a suggested minimum number of line events that the
 * kernel should buffer.  This is only relevant if edge detection is
 * enabled in the configuration. Note that this is only a suggested value
 * and the kernel may allocate a larger buffer or cap the size of the
 * buffer. If this field is zero then the buffer size defaults to a minimum
 * of @num_lines * 16.
 * @padding: reserved for future use and must be zero filled
 * @fd: if successful this field will contain a valid anonymous file handle
 * after a %GPIO_GET_LINE_IOCTL operation, zero or negative value means
 * error
 */
struct gpio_v2_line_request {
    __u32 offsets[GPIO_V2_LINES_MAX];
    char consumer[GPIO_MAX_NAME_SIZE];
    struct gpio_v2_line_config config;
    __u32 num_lines;
    __u32 event_buffer_size;
    /* Pad to fill implicit padding and reserve space for future use. */
    __u32 padding[5];
    __s32 fd;
};

TMRh20 commented 8 months ago

LOL.

Well I don't know about you, but I'm taking a break.

2bndy5 commented 8 months ago

Yeah, I have plans for later today. But I'll keep at it throughout the coming weeks. I think I'll finish the clang-format updates first.

2bndy5 commented 8 months ago

I've also been looking at the libgpiod code, and they cache everything, possibly because nVidia Jetson/TX2/etc (& probably some RPi clones) can have multiple character devices (dev/gpiochipX) for GPIO ports.

2bndy5 commented 8 months ago

Apparently, MRAA can detect RPi5 hardware, but I don't know if that means MRAA will completely work on RPi5. They have been supporting character device API for some time...

Pigpio (https://github.com/joan2937/pigpio/issues/586) and BCM2835 (google groups discussion) libs are still assessing how to implement compatibility. I doubt BCM2835 will go on supporting the new hardware paradigm (which ultimately requires interfacing with PCI express). Pigpio might move forward since it already uses the Linux kernel for some stuff.

Remember, WiringPi is dead and littleWire has been broken since before I joined nRF24 org. It is starting to look like our SPIDEV driver might be the only way to support Linux in the future. I'm not saddened by this from a maintenance point of view, but user projects will suffer this new RPi hardware paradigm.

2bndy5 commented 8 months ago

libgpiod's asynch_watch_line_value.c example seems like a big clue about using pthread with char-dev API.

TMRh20 commented 8 months ago

Yeah, I had a feeling the GPIO changes would be slower than previous implementations, but never did any performance testing. Now that we have some working code for v2, caching everything shouldn't be a big challenge.

2bndy5 commented 8 months ago

I took a break. From a design perspective, I think the gpio caches and interrupt implementation should be separate to reduce the created thread's resources.

As far as caching, I think we can just have 1 gpio_line_request and use the gpio_line_request.attrs to configure multiple offsets as input or output. I'm also considering 1 cached gpio_line_request for input lines and another for output lines. Not sure about GPIO destructor though since the cached FDs would have to be closed upon before app exit.

I might be entirely overthinking again.

TMRh20 commented 8 months ago

Hey if it works it works. We should be able to get the same or better performance out of it than prior versions I would hope.

I agree with keeping the GPIO and IRQ stuff separate. The caching I don't know about, I'd have to look at GPIO code more in depth to form a relevant opinion.

2bndy5 commented 8 months ago

So I think I'm caching the FDs properly (in char-dev-irq branch).

I have a gpio_cache struct closing the FDs in it's destructor, so there shouldn't be any need to modify RF24's destructor.
I'm only using 1 cached gpio_v2_line_request obj for all pins used by RF24; uses request.config.attrs[offset].attr.flags instead of request.config.flags.
I also cache 1 gpio_v2_line_values obj for all pins' input/output ops.

It compiles and executes without errors, but running the scanner example shows no signal gets detected. 😞

On a side note, I think our old sysfs approach does not properly free the pins upon app exit. I have to reboot my RPi to allow the char-dev API access to the GPIO22.

TMRh20 commented 8 months ago

Nice work!

Mostly tests fine for me, but I am seeing differences in the scanner example on RPi5, RPi4 and RPi3. The RPi2 seems to not care.

Teh RPI4 comes up with long lines like this sometimes tho: 666666655555555555554444444444433333333333333333333333333333333333333333333333333333333333333333333333333334444444444445556666

Also, on a side note, since the FDs are cached and remain open, the driver now gives a nice error report if you try to run two instances of RF24 at the same time, using the same pins. Sweet deal, this is something I did regularly when running through tests etc. so its a nice behavior to have.

TMRh20 commented 8 months ago

Hmm, taking a look at RF24.cpp, I found something unusual that I haven't thought about in a while, but see the lines here

Essentially, on faster devices, I put in a delay of 5us when toggling the CS pin. If I modify the code to the following, it seems to work much better on RPi with the scanner example.

#if !defined(RF24_LINUX)
    digitalWrite(csn_pin, mode);
#else
    static_cast<void>(mode); // ignore -Wunused-parameter
#endif // !defined(RF24_LINUX)
  delayMicroseconds(csDelay);  // Delay for all devices
}

Maybe we could do something like the following for the CS pin?:

#if (!defined(F_CPU) || F_CPU > 20000000) && defined FAILURE_HANDLING 
    delayMicroseconds(csDelay);
#endif

The whole point of the delay was double: a: Ensure the pin gets toggled for at least 5us b: Slow down the polling via SPI, leaving the radio to process radio data instead of being hammered by SPI requests

The higher layers seem to perform better too with this change.

2bndy5 commented 8 months ago

I was going to play with char-dev debouncing too. I think with caching, we've hit the too fast problem.

2bndy5 commented 8 months ago

its a nice behavior to have

I also set the consumer string in the request obj. So, now if you run gpioinfo gpiochip0 (a tool provided by libgpiod) while a RF24 app is running, you'll see which pins are consumed by the "RF24 lib".

TMRh20 commented 8 months ago

I think with caching, we've hit the too fast problem.

Yup, there needs to be some sort of GPIO/CSN delay for Linux now along with the faster MCUs. Will you include this in your branch then?

2bndy5 commented 8 months ago

Yeah, coding the debouncing now. There's a limit of 10 attrs that we can configure for each request.config, so I'm going with 3: 1 attr to declare which pins are inputs, 1 attr to declare which pins are outputs, and 1 attr to declare the debouncing period of 5 microseconds on each output pin.

2bndy5 commented 8 months ago

I think only 1 type of attr (input flag, output flag, debounce period) can be associated with a pin. Meaning, setting an attr for output and another attr for debouncing on 1 pin doesn't take.

/**
 * struct gpio_v2_line_config - Configuration for GPIO lines
 * @flags: flags for the GPIO lines, with values from &enum
 * gpio_v2_line_flag, such as %GPIO_V2_LINE_FLAG_ACTIVE_LOW,
 * %GPIO_V2_LINE_FLAG_OUTPUT etc, added together.  This is the default for
 * all requested lines but may be overridden for particular lines using
 * @attrs.
 * @num_attrs: the number of attributes in @attrs
 * @padding: reserved for future use and must be zero filled
 * @attrs: the configuration attributes associated with the requested
 * lines.  Any attribute should only be associated with a particular line
 * once.  If an attribute is associated with a line multiple times then the
 * first occurrence (i.e. lowest index) has precedence.
 */
struct gpio_v2_line_config {
    __aligned_u64 flags;
    __u32 num_attrs;
    /* Pad to fill implicit padding and reserve space for future use. */
    __u32 padding[5];
    struct gpio_v2_line_config_attribute attrs[GPIO_V2_LINE_NUM_ATTRS_MAX];
};

My initial (local) test has not shown any difference. I pushed what I have in case you can test with it. I also tried using 1 attr to define both output flag and debouncing period (for all output pins), alas I still don't see a difference.

I might have to switch to a std::map of port -> request key/value pairs for each pin used. Alternatively, I guess we could try altering RF24.cpp, but I was hoping to avoid more ifdef soup.

TMRh20 commented 8 months ago

Yeah, I just tested it and I don't think debounce introduces a delay of 5us after toggling, it would be to prevent togging more often than 5us, so I don't think it will work in this application. We might need an actual delay. Unfortunately I've been thinking about adding another, separate define to specifically make more functions interrupt safe, as right now all you can do is comment out FAILURE_HANDLING. It might be best left as is though lol.

2bndy5 commented 8 months ago

My std::map attempt yielded same result. Its a pain when Linux kernel doc strings are so terse and online tutorials are scarce or outdated.

I think the debounce period is more for input pins. https://www.kernel.org/doc/html/v4.17/driver-api/gpio/driver.html#gpios-with-debounce-support

I added the delayMicroseconds() call to RF24::ce(). 😞 https://github.com/nRF24/RF24/commit/18045c7d4e111de1c6bc6d977e6ab3a07fa93d3c We should probably have a RF24_SPIDEV defined so we only delay 5us on CE toggle when using the SPIDEV driver. Other Linux drivers are slow enough to not have this problem.

2bndy5 commented 8 months ago

Seems to be working now. I'm going to take a break before tackling the IRQ implementation...

TMRh20 commented 8 months ago

I'm still getting the same erroneous results with the scanner examples with your current code.

The CE pin is only toggled on transmit, and delays are added in the lib already if needed. re: startWrite() Adding a delay to CS will affect reception as well as how often available() can be called etc.

2bndy5 commented 8 months ago

But the gpio stuff in SPIDEV isn't managing the CS. The asserted CE used for entering RX is how I perceived the problem with the scanner example. Works fine on my RPi3.

2bndy5 commented 8 months ago

ok, I'm getting periodic failures in gettingstarted on RPi3 followed by very high transmission times. I'm not getting any errors on my RPi4 with gettingstarted. Is this similar to what you're seeing?

2bndy5 commented 8 months ago

I applied your idea locally

diff --git a/RF24.cpp b/RF24.cpp
index d4a7b33..7d7836e 100644
--- a/RF24.cpp
+++ b/RF24.cpp
@@ -92,10 +92,11 @@ void RF24::csn(bool mode)

 #if !defined(RF24_LINUX)
     digitalWrite(csn_pin, mode);
-    delayMicroseconds(csDelay);
+    //delayMicroseconds(csDelay);
 #else
     static_cast<void>(mode); // ignore -Wunused-parameter
 #endif // !defined(RF24_LINUX)
+    delayMicroseconds(csDelay);
 }

 /****************************************************************************/
@@ -108,7 +109,7 @@ void RF24::ce(bool level)
 #endif
         digitalWrite(ce_pin, level);
 #ifdef RF24_LINUX
-        delayMicroseconds(5);
+        //delayMicroseconds(5);
 #endif
 #ifndef RF24_LINUX
     }

It yielded the same result but much slower scanner sweeps. The gettingstarted transmission times were considerably slower but ~more~ just as reliable on my RPi3

Transmission successful! Time to transmit = 1042 us. Sent: 0
Transmission successful! Time to transmit = 1047 us. Sent: 0.01
Transmission successful! Time to transmit = 1066 us. Sent: 0.02
Transmission successful! Time to transmit = 1047 us. Sent: 0.03
Transmission successful! Time to transmit = 1027 us. Sent: 0.04
Transmission successful! Time to transmit = 1031 us. Sent: 0.05
Transmission successful! Time to transmit = 1039 us. Sent: 0.06
Transmission successful! Time to transmit = 2752 us. Sent: 0.07
Transmission successful! Time to transmit = 2658 us. Sent: 0.08
Transmission successful! Time to transmit = 2721 us. Sent: 0.09
Transmission successful! Time to transmit = 1024 us. Sent: 0.1
Transmission failed or timed out
Transmission successful! Time to transmit = 4387 us. Sent: 0.11
Transmission failed or timed out
Transmission successful! Time to transmit = 2675 us. Sent: 0.12
Transmission successful! Time to transmit = 1039 us. Sent: 0.13
Transmission successful! Time to transmit = 9715 us. Sent: 0.14
Transmission successful! Time to transmit = 999 us. Sent: 0.15
Transmission successful! Time to transmit = 1038 us. Sent: 0.16
Transmission failed or timed out
Transmission failed or timed out
Transmission successful! Time to transmit = 1039 us. Sent: 0.17
Transmission successful! Time to transmit = 1023 us. Sent: 0.18
Transmission failed or timed out
Transmission failed or timed out
6 failures detected. Leaving TX role.

TMRh20 commented 8 months ago

You are correct I figure, forgot to add the CE delay to RF24.cpp when testing. Now the scanner examples seem to work fine!

My gettingstarted examples seem to work fine with or without the CE delay.

It seems I was getting better high-speed transfers at the RF24Gateway layer with the CS delay. Testing with data over SSH and IPerf3. Now I don't know what the answer is. :p

2bndy5 commented 8 months ago

I'm at a loss as well. This is why I can't have nice things.

TMRh20 commented 8 months ago

I think the CE delay makes more logical sense, so lets go with that for now. We can always adjust later if needed.

2bndy5 commented 8 months ago

Further tests without CE delay seem to be fine in gettingstarted. The scanner example on my RPi4 and RPi3 still require at least a 1us delay. I think the scanner example alone just needs an added delay instead of every call to ce(), but raising the wait time to 130 does not have the same effect as having the delay in ce(). 🤷🏼

TMRh20 commented 8 months ago

I found only one issue so far after a bunch of testing on your char-dev-irq branch: a: If the radio experiences an error or the radio is simply removed from the Raspberry Pi, it will crash if using RF24Mesh or RF24Gateway. b: This is due to radio.begin() being called a second time and the subsequent call to pinMode(ce_pin,OUTPUT)

I think either we could add something in RF24.cpp if(!pinOpened){ pinMode(ce_pin,OUTPUT); } or else do something similar in GPIO.cpp, but you may have better ideas on how to handle this.

2bndy5 commented 8 months ago

hmm. I thought I had some code in GPIO::open() that would check if the pin specified is already in cache. perhaps that isn't working? https://github.com/nRF24/RF24/blob/4f434635f36da244f7448827bd0b3677f0c0de91/utility/SPIDEV/gpio.cpp#L95-L100 https://github.com/nRF24/RF24/blob/4f434635f36da244f7448827bd0b3677f0c0de91/utility/SPIDEV/gpio.cpp#L29-L39

2bndy5 commented 8 months ago

__u32 i = 0

maybe use int 1 = 0;? Because offset is signed int (negative means not in cached).

I think the max lines per request obj is 64, so if we need to use uint32, then the sentinel -1 could instead be anything > 64.

TMRh20 commented 8 months ago

Changing to int just gives a warning when compiling. How about

    int offset = gpio_cache.getPortOffset(port);
    if (offset < 0) { // pin not in use; add it to cached request
        offset = request.num_lines;
        request.offsets[offset] = port;
        request.num_lines += 1;
    }else{
      return;    
    }

2bndy5 commented 8 months ago

IDK, my gut says no because it just ignores the problem and lets begin() carry on without proper cache validation. I originally added that code because: What if the user ended up changing the pin's direction (for whatever reason)?

TMRh20 commented 8 months ago

Changing the CE pins direction while the RF24 driver is running? lol You might be overthinking again, but I'll leave this one with you to figure out a proper solution.

2bndy5 commented 8 months ago

You might be overthinking again

Oh, I definitely am. I wouldn't have even worried about it if the code was namespaced. I'm also hoping to code it in a way that can be abstracted as a submodule (separate lib). I would love to be able to have this much Linux kernel wrappers (+ my work on wrapping I2C) for other Arduino libs.

All my work on that branch is in my Ubuntu partition (easier to look at the gpio.h and libgpiod code that way). I'll take a look at it tomorrow. Thanks for the steps to reproduce!

2bndy5 commented 8 months ago

Ok I looked into it, and it looks like the request.fd can't be re-opened once it is already opened.

this patch worked

```diff diff --git a/examples_linux/gettingstarted.cpp b/examples_linux/gettingstarted.cpp index 78e244a..a1d1187 100644 --- a/examples_linux/gettingstarted.cpp +++ b/examples_linux/gettingstarted.cpp @@ -57,6 +57,7 @@ int main(int argc, char** argv) cout << "radio hardware is not responding!!" << endl; return 0; // quit now } + radio.begin(); // to use different addresses on a pair of radios, we need a variable to // uniquely identify which address this radio will use to transmit diff --git a/utility/SPIDEV/gpio.cpp b/utility/SPIDEV/gpio.cpp index aab3dbc..340ce1a 100644 --- a/utility/SPIDEV/gpio.cpp +++ b/utility/SPIDEV/gpio.cpp @@ -107,12 +107,15 @@ void GPIO::open(rf24_gpio_pin_t port, int DDR) attr_input.mask |= (1LL << offset); } - int ret = ioctl(gpio_cache.fd, GPIO_V2_GET_LINE_IOCTL, &request); - if (ret == -1) { - std::string msg = "[GPIO::open] Can't get line handle from IOCTL; "; - msg += strerror(errno); - throw GPIOException(msg); - return; + int ret; + if (request.fd <= 0) { + ret = ioctl(gpio_cache.fd, GPIO_V2_GET_LINE_IOCTL, &request); + if (ret == -1) { + std::string msg = "[GPIO::open] Can't get line handle from IOCTL; "; + msg += strerror(errno); + throw GPIOException(msg); + return; + } } ret = ioctl(request.fd, GPIO_V2_LINE_SET_CONFIG_IOCTL, &request.config); if (ret == -1) { ```

PS - I just found out I can code in a repo on my RPi from my everyday PC using VSCode's Remote SSH feature! 😲 I actually pushed the fix from my RPi using VSCode remotely.

2bndy5 commented 8 months ago

Now I'm having trouble init-ing another RF24 object. Kinda need to support multiple radio objects if there's more than 1 SPI bus (with multiple CS options for each).

TMRh20 commented 8 months ago

Hmm that might be difficult if keeping the FDs open etc. Might need to make that optional

2bndy5 commented 8 months ago

I can pinMode(x, OUTPUT) multiple pins, but for some reason I can't run begin() on multiple RF24 objects. The problem seems to be in GPIO::write()

offset[0] = 22
initing second radio
offset[1] = 21
terminate called after throwing an instance of 'GPIOException'
  what():  [GPIO::write] Can't set line value from IOCTL; Operation not permitted
Aborted

TMRh20 commented 8 months ago

Hmm, I can't seem to get it working on any pin except 22 even with 1 radio, every other pin I try results in [GPIO::write] Can't set line value from IOCTL; Operation not permitted

2bndy5 commented 8 months ago

I think I need to switch to using a separate request object for each pin. This would involve std::map...

2bndy5 commented 8 months ago

Using std::map to cache request objects seems to be slower. But at least I can init a second, different pin and another different RF24 object (in the same program). Although scanner example still requires delayMicrosecond(1) in ce().

2bndy5 commented 8 months ago

I figured out that we can use the same request object to configure different pins. All we have to cache is the request.fd after configuring with ioctl(). There's a bit more "code golfing" in f184a2fd9c851bb70fde4cb4c3858b0bf2e420a8 and b13d48024b87c43029d248270318f4f9fbd1f56e, but I like the results (on my RPi3).

TMRh20 commented 8 months ago

Tested the latest on my RPi5, 4 & 3. No problems! I haven't actually looked at the code yet, but it works.

2bndy5 commented 8 months ago

I see the wiringPi mirror on github just releases v3.0 with RPi5 support. Looking at the code, they're still using deprecated sysfs calls though, not the recommended char-dev API. I'm guessing to support RPi5, they went with a table to map BCM pin numbers to the newer sysfs pin numbers (which are inexplicably high like 4xx).

But how they do interrupts is interesting because it involves both poll() and pthread (with callbacks to cached function pointers provided by user code).

TMRh20 commented 8 months ago

Good deal, wiringPi is pretty handy, so I'm glad they are still supporting it. Just tested with our RF24 libs on RPi5 and latest wiringPi and it works well too!

I'm guessing we will need to do something pretty similar for interrupts in SPIDEV.

2bndy5 commented 8 months ago

Yep. Clearly, I'm still researching how to use poll() together with pthread... I plan on tackling that very soon.

nRF24 / RF24

Use Linux kernel's character device API to implement IRQ capability #943

Useful resources