raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.08k stars 4.96k forks source link

lost gpio interrupts #760

Closed chaosjug closed 9 years ago

chaosjug commented 9 years ago

If gpio interrupts on different gpios occur on almost the same time, sometime some of them seem to get lost.

This is what I do to test this issue.

Create 100 Interrupts with pigpio and this code (wave.c):

#include <stdio.h>
#include <pigpio.h>

#define PULSE 200
#define GPIO 12

int main()
{
    int i, wave_id, len;
    gpioPulse_t pulse[PULSE];
    uint32_t g_mask;

    g_mask |= (1<<GPIO);

    if (gpioInitialise()<0) return 1;

    for (i=0; i<PULSE; i += 2) {
        pulse[i].gpioOn  = g_mask;
        pulse[i].gpioOff = 0;
        pulse[i].usDelay = 1000;
        pulse[i+1].gpioOn  = 0;
        pulse[i+1].gpioOff = g_mask;
        pulse[i+1].usDelay = 1000;
    }
    gpioSetMode(GPIO, PI_OUTPUT);
    gpioWaveClear();
    gpioWaveAddGeneric(PULSE, pulse);
    wave_id = gpioWaveCreate();
    gpioWaveTxSend(wave_id, PI_WAVE_MODE_ONE_SHOT);

    len = gpioWaveGetMicros();
    gpioDelay(PULSE*1000);

    gpioTerminate();
    return 0;
}

Connect the gpio to two other gpios (in my case connect 12 to 4 and 22. Export gpios in sysfs and enable interrupt:

echo 4 > /sys/class/gpio/export
echo rising > /sys/class/gpio/gpio4/edge
echo 22 > /sys/class/gpio/export
echo rising > /sys/class/gpio/gpio22/edge

count interrupts with this code (cnt_irqs.c):

#include <stdio.h>
#include <poll.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <time.h>

#define GPIO_FN_MAXLEN  32
#define POLL_TIMEOUT    5000
#define RDBUF_LEN   5

int main(int argc, char **argv) {
    char fn[GPIO_FN_MAXLEN];
    int fd,ret,cnt;
    struct pollfd pfd;
    char rdbuf[RDBUF_LEN];
    struct timespec old_t, new_t;

    memset(rdbuf, 0x00, RDBUF_LEN);
    if(argc!=2) {
        printf("Usage: %s <GPIO>\nGPIO must be exported to sysfs and have enabled edge detection\n", argv[0]);
        return 1;
    }
    snprintf(fn, GPIO_FN_MAXLEN-1, "/sys/class/gpio/gpio%s/value", argv[1]);
    fd=open(fn, O_RDONLY);
    if(fd<0) {
        perror(fn);
        return 2;
    }
    pfd.fd=fd;
    pfd.events=POLLPRI;

    ret=read(fd, rdbuf, RDBUF_LEN-1);
    if(ret<0) {
        perror("read()");
        return 4;
    }
    printf("value is: %s\n", rdbuf);

    clock_gettime(CLOCK_REALTIME, &old_t);
    cnt = 0;
    while(1) {
        memset(rdbuf, 0x00, RDBUF_LEN);
        lseek(fd, 0, SEEK_SET);
        ret=poll(&pfd, 1, POLL_TIMEOUT);
        if(ret<0) {
            perror("poll()");
            close(fd);
            return 3;
        }
        if(ret==0) {
            printf("timeout\n");
            break;
        }
        ret=read(fd, rdbuf, RDBUF_LEN-1);
        if(ret<0) {
            perror("read()");
            return 4;
        }
        cnt++;
    }
    close(fd);
    printf("%d interupts\n", cnt);
    return 0;
}

What I do now is open 3 shells on the pi. In short order I start ./cnt_irqs 4, ./cnt_irqs 22 and sudo ./wave I would now expect 100 interupts as a result on both cnt_irqs, but I usually get something between 95 and 99. If I start only one cnt_irqs then I get the expected 100 interrupts.

when I look at bcm2708_gpio_interrupt in bcm2708_gpio.c,

static irqreturn_t bcm2708_gpio_interrupt(int irq, void *dev_id)
{
    unsigned long edsr;
    unsigned bank;
    int i;
    unsigned gpio;
    for (bank = 0; bank <= 1; bank++) {
        edsr = readl(__io_address(GPIO_BASE) + GPIOEDS(bank));
        for_each_set_bit(i, &edsr, 32) {
            gpio = i + bank * 32;
            generic_handle_irq(gpio_to_irq(gpio));
        }
    writel(0xffffffff, __io_address(GPIO_BASE) + GPIOEDS(bank));
    }
    return IRQ_HANDLED;
}

The writel(0xffffffff, __io_address(GPIO_BASE) + GPIOEDS(bank)); seems to be the problem, as it resets all bit. So if an interrupt occurs during the for_each_bit loop, it will not get handled.

If I add writel(1<<i,__io_address(GPIO_BASE) + GPIOEDS(bank)); in the loop instead, it seems to work for me.

I got the idea from here where similar issues were discussed.

popcornmix commented 9 years ago

I agree clearing the explicit bit is correct. I'll make the change.

notro commented 9 years ago

Just for reference, the upstream code is sligtly different when it comes to when that bit is cleared. Probably it increases the edge-interrupt detection speed. At least when it's just two very close interrupts. Like detection of both edges in a narrow pulse. I have no idea of fast it needs to go to loose an edge though, with only bit clearing on the end. Depends of course on how long the interrupt handler takes to run. Just speculation here, thought I'd mention it.

        for_each_set_bit(offset, &events, 32) {
                gpio = (32 * bank) + offset;
                type = pc->irq_type[gpio];

                /* ack edge triggered IRQs immediately */
                if (!(type & IRQ_TYPE_LEVEL_MASK))
                        bcm2835_gpio_set_bit(pc, GPEDS0, gpio);

                generic_handle_irq(irq_linear_revmap(pc->irq_domain, gpio));

                /* ack level triggered IRQ after handling them */
                if (type & IRQ_TYPE_LEVEL_MASK)
                        bcm2835_gpio_set_bit(pc, GPEDS0, gpio);
        }

Ref: http://lxr.free-electrons.com/ident?i=bcm2835_gpio_irq_handler

popcornmix commented 9 years ago

Interesting. I can see the logic that if handling the interrupt is slow, you can catch a subsequent interrupt a bit quicker. I don't have a good way of testing this, so I'll go for the simple change for now, and if someone has a test setup (like a pulse generator connected to gpio) we can try implementing the upstream way.

popcornmix commented 9 years ago

@chaosjug Can you (from a latest 3.12 rpi-update kernel) update to this: https://dl.dropboxusercontent.com/u/3669512/temp/kernel_gpio_clear.img

(rename to kernel.img) and confirm it works. If it does I'll push change to 3.12 kernel.

chaosjug commented 9 years ago

As my first testing method was not very robust, I tried another approach. I'm now generating two interrupts 10us appart with this code:

#include <stdio.h>
#include <pigpio.h>

#define PULSE 3
#define GPIO1 5
#define GPIO2 6

int main()
{
        int i, wave_id;
        gpioPulse_t pulse1[PULSE], pulse2[PULSE];
        uint32_t g_mask1, g_mask2;

        g_mask1 |= (1<<GPIO1);
        g_mask2 |= (1<<GPIO2);

        if (gpioInitialise()<0) return 1;

        pulse1[0].gpioOn  = 0;
        pulse1[0].gpioOff = g_mask1;
        pulse1[0].usDelay = 10;
        pulse1[1].gpioOn  = g_mask1;
        pulse1[1].gpioOff = 0;
        pulse1[1].usDelay = 1000;
        pulse1[2].gpioOn  = 0;
        pulse1[2].gpioOff = g_mask1;
        pulse1[2].usDelay = 1000;

        pulse2[0].gpioOn  = g_mask2;
        pulse2[0].gpioOff = 0;
        pulse2[0].usDelay = 10;
        pulse2[1].gpioOn  = g_mask2;
        pulse2[1].gpioOff = 0;
        pulse2[1].usDelay = 1000;
        pulse2[2].gpioOn  = 0;
        pulse2[2].gpioOff = g_mask2;
        pulse2[2].usDelay = 1000;

        gpioSetMode(GPIO1, PI_OUTPUT);
        gpioSetMode(GPIO2, PI_OUTPUT);
        gpioWaveClear();
        gpioWaveAddGeneric(PULSE, pulse1);
        gpioWaveAddGeneric(PULSE+1, pulse2);
        wave_id = gpioWaveCreate();
        gpioWaveTxSend(wave_id, PI_WAVE_MODE_ONE_SHOT);
        gpioDelay(PULSE*1000);

        gpioTerminate();
        return 0;
}

I also added new debug output to the isr:

static irqreturn_t bcm2708_gpio_interrupt(int irq, void *dev_id)
{
    unsigned long edsr, edsr1;
    unsigned bank;
    int i;
    unsigned gpio;
    for (bank = 0; bank <= 1; bank++) {
        edsr = readl(__io_address(GPIO_BASE) + GPIOEDS(bank));
        for_each_set_bit(i, &edsr, 32) {
            gpio = i + bank * 32;
            generic_handle_irq(gpio_to_irq(gpio));
        }
        edsr1 = readl(__io_address(GPIO_BASE) + GPIOEDS(bank));
        writel(0xffffffff, __io_address(GPIO_BASE) + GPIOEDS(bank));
        printk(KERN_INFO DRIVER_NAME ": bcm2708_gpio_interrupt %lx %lx\n", edsr, edsr1);
    }
    return IRQ_HANDLED;
}

connecting ths two output GPIOs (now 5 and 6) to two other GPIOs and enabling the interrupts leads to this output:

Jan 10 11:49:49 raspberrypi kernel: [  211.535282] bcm2708_gpio: bcm2708_gpio_interrupt 10 400000
Jan 10 11:49:49 raspberrypi kernel: [  211.535318] bcm2708_gpio: bcm2708_gpio_interrupt 0 0

So the first time, the register is read, only the first interrupt is there. The second time the register is read, only the second interupt is there. This suggests to me, that we actually do not need to clear the bit at all (at least for edge interrupts)! This might be different for level interrupts but I can not activate them from sysfs, so I could not test this.

I also tested the latest rpi-update kernel and the one @popcornmix provided. With the original kernel I loose the second interrupt most of the time, with the other kernel I get both. So this is definitively an improvement.

popcornmix commented 9 years ago

Looks like bcm2708_gpio_irq_unmask clears the interrupt and I believe that is called every interrupt, so clearing in bcm2708_gpio_interrupt is redundant. @chaosjug can try try removing the writel to GPIOEDS in bcm2708_gpio_interrupt and check it works okay?

pelwell commented 9 years ago

I'm wondering if the unmask call is necessary in the interrupt handler. The lirc-rpi module seems to work without it.

popcornmix commented 9 years ago

Are you suggesting not assigning:

    .irq_unmask = bcm2708_gpio_irq_unmask,
    .irq_mask = bcm2708_gpio_irq_mask,
pelwell commented 9 years ago

No, that wasn't what I meant. The lirc-rpi interrupt handler used to explicitly call the irq_unmask method, and that seems to not be necessary. But if the interrupt framework is also calling unmask then it is probably there for a reason.

notro commented 9 years ago

I tried to look at the interrupt code in the kernel, but couldn't determine if irq_mask/unmask is called during interrupt processing. @chaosjug Can you add printk's to those functions to determine whether they are called or not?

The reason I ask, is that pinctrl-bcm2835 protects irq_enable/disable with spinlocks, whereas bcm2708_gpio don't. Just to rule out a possible race condition.

Spinlock on uniprocessors with preemption results in: spin_lock -> preempt_disable spin_unlock -> preempt_enable

Ref: Linux Kernel Architecture ch 5.2.2 Spinlocks

chaosjug commented 9 years ago

I tried removing the writel and it still works, but I only tested with a rising edge interrupt. I guess this will be different for high/low level interrupts. In that case the handler has to make sure that the level is back to the not active state before clearing the bit, otherwise we would just get the next interrupt. So the distinction in the upstream version mentioned by notro makes sense to me.

chaosjug commented 9 years ago

I added printk to irq_mask/unmask and kept the printk in gpio_interrupt from above and removed the writel in gpio_interrupt. Then I generated two interrupts 10us apart and I get this result:

Jan 10 22:15:51 raspberrypi kernel: [   99.632394] bcm2708_gpio: bcm2708_gpio_irq_mask
Jan 10 22:15:51 raspberrypi kernel: [   99.632462] bcm2708_gpio: bcm2708_gpio_irq_unmask
Jan 10 22:15:51 raspberrypi kernel: [   99.632481] bcm2708_gpio: bcm2708_gpio_interrupt 10 400000
Jan 10 22:15:51 raspberrypi kernel: [   99.632493] bcm2708_gpio: bcm2708_gpio_interrupt 0 0
Jan 10 22:15:51 raspberrypi kernel: [   99.632547] bcm2708_gpio: bcm2708_gpio_irq_mask
Jan 10 22:15:51 raspberrypi kernel: [   99.632567] bcm2708_gpio: bcm2708_gpio_irq_unmask
Jan 10 22:15:51 raspberrypi kernel: [   99.632581] bcm2708_gpio: bcm2708_gpio_interrupt 400000 0
Jan 10 22:15:51 raspberrypi kernel: [   99.632593] bcm2708_gpio: bcm2708_gpio_interrupt 0 0
notro commented 9 years ago

@chaosjug thanks. So racing is possible, but I guess not very likely for normal use cases.

popcornmix commented 9 years ago

So @pelwell, @notro, remove the writel to GPIOEDS in bcm2708_gpio_interrupt?

notro commented 9 years ago

I looked up the BCM2835 ARM Peripherals datasheet:

GPIO Event Detect Status Registers (GPEDSn)

The event detect status registers are used to record level and edge events on the
GPIO pins. The relevant bit in the event detect status registers is set whenever:

1) an edge is detected that matches the type of edge programmed in the rising/falling 
   edge detect enable registers
2) a level is detected that matches the type of level
   programmed in the high/low level detect enable registers. The bit is cleared by
   writing a “1” to the relevant bit.

From this it seems that level interrupts need clearing, but edge interrupts need not. Anyone have access to the more verbatim edition of this text?

popcornmix commented 9 years ago

I'm sure that an edge interrupt does need clearing. It will continue to read as set until you explicitly write to it, and I believe the interrupt will continue to assert whenever a bit is set in this register.

The original spec has the same wording, but the formatting is:

The event detect status registers are used to record level and edge events on
the GPIO pins. The relevant bit in the event detect status registers is set
whenever: 1) an edge is detected that matches the type of edge programmed
in the rising/falling edge detect enable registers, or 2) a level is detected that
matches the type of level programmed in the high/low level detect enable
registers. The bit is cleared by writing a “1” to the relevant bit.

The interrupt controller can be programmed to interrupt the processor when
any of the status bits are set. The GPIO peripheral has three dedicated interrupt
lines. Each GPIO bank can generate an independent interrupt.  The third line
generates a single interrup whenever any bit is set.

I'm sure the line "The bit is cleared by writing a “1” to the relevant bit." applies to both 1) and 2).

notro commented 9 years ago

That makes sense, it was my bad formatting. So writing to the register is required. But why does the code work when this writel is removed? Is it the unmasking that clears the event status bit?

popcornmix commented 9 years ago

Is it the unmasking that clears the event status bit?

Yes: https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/bcm2708_gpio.c#L242 and we see from https://github.com/raspberrypi/linux/issues/760#issuecomment-69472495 that unmask is called every interrupt. So we are clearing EDS twice. Seems one of the clears should be removed.

notro commented 9 years ago

I missed that line. IMO clearing should happen in the interrupt routine since it is the consumer of the event. For comparison, pinctrl-bcm2835 doesn't clear event status in irq_enable. Maybe the writer of bcm2708_gpio put in the extra clearing as a "better safe than sorry" measure or to patch a hardware quirck, or maybe a quick and dirty solution to a customer problem.

But if there's no mention of this in the errata, I guess removing it from irq_unmask should be fine.

popcornmix commented 9 years ago

Seems that line has been present in unmask function since interrupt support was added (by community member Mrkva) https://github.com/raspberrypi/linux/commit/2f3523e91e1cdbfbb5871f0943333620b139e032

I'm guessing it was accidentally left in when developing and was (mostly) harmless so has just stayed.

I agree that removing that line does sound right.

notro commented 9 years ago

I see there where spinlock use in the skeleton code he built on, but he left it out.

pelwell commented 9 years ago

I've got a patch that prevents ...irq_mask and ...irq_unmask from being called for each interrupt. The code is based on that in pinctrl-bcm2835.c, and works by requesting that handle_simple_irq is used instead of handle_level_irq. The patch also removes the write to the EDS register in ...irq_unmask.

You can find the patch here: http://pastebin.com/DS5Redjs

notro commented 9 years ago

One more difference: pinctrl-bcm2835 clears the event status bits (and detection flags) in probe, but bcm2708_gpio doesn't. I don't know if this is significant now that unmask doesn't do it anymore. POR values are probably zero for these registers. Ref: http://lxr.free-electrons.com/source/drivers/pinctrl/pinctrl-bcm2835.c#L999

pelwell commented 9 years ago

Updated patch with clearing in probe: http://pastebin.com/QdzksaSr

If there are no objections I'll get this pushed.

notro commented 9 years ago

Have you tested the change with both level and edge interrupts?

pelwell commented 9 years ago

Edge only, so far. I think that testing level sensitive interrupts is going to need a test driver because they have to be cleared at source.

notro commented 9 years ago

Yes, you're right. I didn't think about that. When moving to 3.18 and Device Tree, this driver probably (hopefully) won't see much use. Now that we use pinctrl-bcm2835.

pelwell commented 9 years ago

I can confirm that level-sensitive interrupts (high and low) also work. And pulling out the jumper connecting the pins stops them working, as expected.

pelwell commented 9 years ago

@chaosjug Can I close this now?

chaosjug commented 9 years ago

I tested the commit on my system and it works here as well. I'll close the issue. Thanks a lot to all of you.