joan2937 / lg

Linux C libraries and Python modules for manipulating GPIO
The Unlicense
57 stars 20 forks source link

Working around missing edge events #18

Open matthijskooijman opened 1 year ago

matthijskooijman commented 1 year ago

I've got a button attached to a gpio pin on an Orange Pi PC (Allwinner H3). I'm reading alerts from the pins using gpipzero with the lgpio backend, with debouncing enabled in lgpio. What I'm seeing is that sometimes when pressing the button briefly, I get the falling edge (the button is active-low), but not the rising edge. With gpio_zero's hold_repeat feature, this means the button will generate button presses continuously...

I confirmed (with some printing inside gpiozero) that in this case lgpio is not reporting the trailing rising edge. I suspected that maybe the debouncing code in lgpio might be supressing this edge (because it is considered a bounce), but looking at the code, it seems this case is explicitely handled by emitting the edge anyway after it was stable for the debounce time.

I then tried reproducing this with gpiomon, which should use the same /dev/gpiochip0 interface and directly shows events emitted by the kernel, and saw at least one instance where the trailing event was not emitted by the kernel:

pi@orangepipc:/usr/lib/python3/dist-packages/gpiozero$ sudo gpiomon 0 201
event:  RISING EDGE offset: 201 timestamp: [   18236.142590527]
event: FALLING EDGE offset: 201 timestamp: [   18236.902041694]
event:  RISING EDGE offset: 201 timestamp: [   18236.958867611]
event: FALLING EDGE offset: 201 timestamp: [   18237.374040611]
event:  RISING EDGE offset: 201 timestamp: [   18237.417619736]
event:  RISING EDGE offset: 201 timestamp: [   18237.906252111]
event: FALLING EDGE offset: 201 timestamp: [   18239.228219987]
event:  RISING EDGE offset: 201 timestamp: [   18239.367462945]
event: FALLING EDGE offset: 201 timestamp: [   18239.856171404]
event:  RISING EDGE offset: 201 timestamp: [   18239.867241946]
event: FALLING EDGE offset: 201 timestamp: [   18240.377663613]
event:  RISING EDGE offset: 201 timestamp: [   18240.589072571]
event: FALLING EDGE offset: 201 timestamp: [   18241.446177321]
event:  RISING EDGE offset: 201 timestamp: [   18241.526287071]
event: FALLING EDGE offset: 201 timestamp: [   18241.921043155]
event:  RISING EDGE offset: 201 timestamp: [   18241.961417655]
event: FALLING EDGE offset: 201 timestamp: [   18242.575939655]
event:  RISING EDGE offset: 201 timestamp: [   18242.633683114]
event: FALLING EDGE offset: 201 timestamp: [   18242.633828864]
event:  RISING EDGE offset: 201 timestamp: [   18242.633980780]
event: FALLING EDGE offset: 201 timestamp: [   18244.282714114]
event:  RISING EDGE offset: 201 timestamp: [   18244.325010073]
event: FALLING EDGE offset: 201 timestamp: [   18244.942461906]
^Cpi@orangepipc:/usr/lib/python3/dist-packages/gpiozero$ sudo gpioget 0 201
1

This shows a number of button presses. At the end, the button is depressed, but the last edge is FALLING. Then checking the value of the GPIO pin (without touching the button again) shows that the value is indeed 1 (i.e. button depressed). Also note that there are two consecutive RISING edges in the output somewhere as well.

This suggests that the kernel is indeed not emitting all events. I could not find proper documentation about the gpiochip interface to figure out if the kernel gives any guarantees about this, but I suspect not.

Maybe lgpio could and should handle this case by periodically re-checking the value of the pin and emitting extra events if it turns out the kernel has omitted an event? I'm not sure how often or when to check, though, I guess that depends a bit on the guarantees that the kernel does give (maybe just one check a "short while" or maybe directly after each edge is sufficient, if the cause of dropping edges is that only one edge/ISR can be queued at the same time). I'm not sure if this is indeed the best place to fix this, but it would probably be convenient for lgpio users if lgpio handled this...

joan2937 commented 1 year ago

I will have a look. Not sure when but hopefully within a week.

The one thing I don't want is problems in this area. They are so tricky to debug and generally even trickier to fix and prone to introduce new errors. The debounce logic, watchdog logic, and the attempt to maintain time order between disparate GPIO triggers is interwoven.

The gpiochip interface certainly drops edges and complete cycles if the edges are too close together. It's slightly sad that pigpio sampling performs better at high interrupt rates.

matthijskooijman commented 1 year ago

The one thing I don't want is problems in this area. They are so tricky to debug and generally even trickier to fix and prone to introduce new errors. The debounce logic, watchdog logic, and the attempt to maintain time order between disparate GPIO triggers is interwoven.

Yeah, agreed, this stuff is complex and hard...

FWIW, I just noticed I did not have lgpio debouncing enabled at all (because it was not working for me, see #19), but had my own running on top (but that only dropped activating edges, so certainly did not cause this issue).

DanielDecker commented 3 months ago

Using lgpio version 0.2.2 without debounce (=0) I get the same result as in the first post. But with debounce set to 1 (=1µs) there are no missing edges. Tested on Raspberry Pi 3 running Debian 12 Bookworm (Raspberry Pi OS) kernel 6.6 with following Python code:

import lgpio as lg

# open default gpio chip
h = lg.gpiochip_open(0)
# claim gpio10 and configure it 
lg.gpio_claim_alert(h, 10, lg.BOTH_EDGES, lg.SET_PULL_UP)
# set debounce for gpio10 to 1µs
lg.gpio_set_debounce_micros(h, 10, 1)

# define callback function
def clb(chip, gpio, level, tick):
     print(chip, gpio, level, tick)

# hock up callback function with gpio10
cb1 = lg.callback(h, 10, lg.BOTH_EDGES, clb)
warthog618 commented 2 months ago

The missing events would be an overflow of the kernel event buffer - so a burst of events is being captured by the kernel faster than they can be processed by userspace.

It occurs to me that the problem was raised when lgpio was using uAPI v1, and that could be contributory.

With uAPI v1 the most recent event would be discarded when the buffer overflowed. This is a problem as the final event after an overflow may not reflect the state of the line at the end of the burst (it may still be correct if an even number of events from the burst is discarded).

This was fixed in uAPI v2 where an overflow results in the oldest event being discarded. This is far kinder to userspace debounce algorithms. Having said that, uAPI v2 also supports debounce itself so there is no longer any need to perform debounce in userspace.

You would need to repeat the problem in a test case to be sure of the root cause and the correctness of any fix. Got any tests?

The gpiochip interface certainly drops edges and complete cycles if the edges are too close together. It's slightly sad that pigpio sampling performs better at high interrupt rates.

If pigpio sampling is polling the hardware directly, and that is the impression I get from "sampling", then that is akin to saying it is slightly sad that the sun will come up tomorrow.

It is not surprising that accessing hardware directly is faster than anything that can be managed across the kernel/userspace interface due to the unavoidable context switching. But then messing with hardware directly isn't very portable, so swings and roundabouts.

With uAPI v2, you can influence the size the buffer in the kernel to try to prevent buffer overflow and so loss of events. Having said that, if the long term interrupt rate is higher than userspace can process them then no amount of buffering will help. Similarly if interrupts arrive faster than the kernel interrupt handler can capture them. If you need that level of performance you should go in-kernel rather than userspace.