raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.03k stars 4.96k forks source link

spidev transfer timeout on Pi 5 with transfers > 64. #6020

Open makermelissa opened 6 months ago

makermelissa commented 6 months ago

Describe the bug

I am writing some code in c and am having spidev timeouts on the Pi 5. The same code runs fine on the Pi 4. The first spidev transfer is fine, but subsequent transfers result in timeout errors (despite it successfully transferring), which produces a much slower transfer.

Steps to reproduce the behaviour

  1. Install Adafruit Snake Eyes Bonnet
  2. Install Raspberry Pi OS 64-bit bookworm Desktop on the Pi 5
  3. Update the system
  4. Follow directions in https://github.com/makermelissa/Pi_Eyes/blob/wayland-update/README.md to install dependencies
  5. Build following same instructions

Device (s)

Raspberry Pi 5

System

Raspberry Pi reference 2023-12-05 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4

2024/01/24 12:16:01 Copyright (c) 2012 Broadcom version 1aeae380 (release) (embedded)

Linux raspberrypi 6.1.0-rpi8-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux

Logs

These message repeat in dmesg: [ 1829.452018] spidev spi0.0: DMA transaction timed out [ 1829.452018] spidev spi1.2: DMA transaction timed out [ 1829.452022] spidev spi0.0: SPI transfer failed: -110 [ 1829.452024] spidev spi1.2: SPI transfer failed: -110 [ 1829.452035] spi_master spi0: failed to transfer one message from queue [ 1829.452035] spi_master spi1: failed to transfer one message from queue [ 1829.452037] spi_master spi1: noqueue transfer failed [ 1829.452038] spi_master spi0: noqueue transfer failed

Additional context

Not 100% sure my code is not the issue, but kind of odd it works fine on Pi 4, but not Pi 5.

makermelissa commented 6 months ago

This may be a duplicate of https://github.com/raspberrypi/linux/issues/5858.

makermelissa commented 6 months ago

Setting transfers to 64 bytes or less fixes the timeout issue, but still quite slow.

makermelissa commented 6 months ago

Perhaps https://github.com/raspberrypi/linux/issues/5696 has popped back up or maybe it's not part of the regular release cycle yet.

pelwell commented 6 months ago

I think this is caused by a bug in the DMA driver that causes the interrupt-enable bit to be cleared. It is caused by a lack of atomicity, and triggered by having transfers submitted on multiple channels simultaneously.

6044 should be a fix. By the time you read this (a.k.a. any time after 10:40 GMT on the 15th) you'll be able to install a 6.6 kernel with the fix by running sudo rpi-update pulls/6044.

pelwell commented 6 months ago

And yes, I think this is effectively a dup of #5858/#5865.

makermelissa commented 6 months ago

Thanks. Yes, the fix worked.

makermelissa commented 6 months ago

Ok, after further testing, I think it's not quite working correctly. I am trying to use both spidev0.0 and spidev1.2 in my code. With the update, the clock signal from spidev1.2 doesn't seem to be working anymore. Like there's tons of activity on MOSI, but SCLK is barely pulsing.

pelwell commented 6 months ago

What clock rates are you using?

makermelissa commented 6 months ago

80 MHz. I could try slower if you think that would help, but the same code runs fine on a Pi 4 at that clock rate.

pelwell commented 6 months ago

And in which directions? Try and help me reproduce your setup.

makermelissa commented 6 months ago

Pretty much all output. I'm writing to 2 ST7789 displays.

pelwell commented 6 months ago

A few things:

  1. The SPI controllers are clocked off clk_sys (200MHz), using an even divisor. This means that your request for 80MHz actually translates to 50MHz. You would have had more flexibility on Pi 4 where the clock is an even divisor of the VPU/core clock, but there the core clock is prone to changing with the ARM clock.
  2. With some combinations of transmits and receives I get RX FIFO overflows:
    $ dmesg -l err
    [12461.687616] dma dma2chan3: dma2chan3 is non-idle!
    [12461.889989] spidev spi1.2: DMA transaction timed out
    [12461.889993] spidev spi1.2: SPI transfer failed: -110
    [12461.890003] spi_master spi1: failed to transfer one message from queue
    [12461.890006] spi_master spi1: noqueue transfer failed

    What kernel messages do you get?

makermelissa commented 6 months ago
  1. Hmm, ok. I tried 50MHz and it didn't change anything.
  2. No error messages at all. It just doesn't work.
pelwell commented 6 months ago

Hmm, ok. I tried 50MHz and it didn't change anything.

Yes, because you've been running it at 50MHz all along. In fact, looking at logic analyser trace it's only driving SCLK half the time, so there's an effective rate of 25MHz.

No error messages at all. It just doesn't work.

At this point it might be best if you put together a cut-down test program that demonstrates the issue.

makermelissa commented 6 months ago

Yeah, probably.

makermelissa commented 6 months ago

Ok, here's some pared down code below(from like 1000 lines).

You can save as screentest.c and compile with cc -o screentest screentest.c -lgpiod -Wall -Ofast

You will need to enable both spi ports in /boot/firmware/config.txt with:

dtparam=spi=on
dtparam=spi1=on
dtoverlay=spi1-3cs

Here's the code

#include <gpiod.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <math.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/spi/spidev.h>
#include <bcm_host.h>

// CONFIGURATION AND GLOBAL STUFF ------------------------------------------

#define DC_PIN    5             // These pins connect
#define RESET_PIN 6             // to BOTH screens
#define DCMASK    (1 << DC_PIN) // GPIO pin bitmasks of reset
#define CONSUMER  "fbx2"        // For use with gpiod

#define SCREEN_IPS       2 // Other types WILL NOT WORK!

uint8_t screenType = SCREEN_IPS;

static const uint8_t initIPS[] = {
  // IPS initialization
  0x01, 0x80,       150,      // Soft reset, no args, 150 ms delay
  0x11, 0x80,       255,      // Out of sleep, no args, 500 ms delay
  0x3A, 0x81, 0x55,  10,      // COLMOD, 1 arg, 10ms delay
  0x36,    1, 0x00,           // MADCTL, 1 arg (RGB), no delay,
  0x26,    1, 0x02,           // GAMSET, 1 arg (curve 2 (G1.8)), no delay
  0xBA,    1, 0x04,           // DGMEN, 1 arg (enable gamma), no delay,
  0x21, 0x80,        10,      // INVON, no args, 10 ms delay
  0x13, 0x80,        10,      // NORON, no args, 10 ms delay
  0x29, 0x80,       255,      // DISPON, no args, 500 ms delay
  0x00 },
winIPS[] = {
  0x2A, 4, 0, 0, 0, 239,       // CASET (column set) xstart, xend (MSB first)
  0x2B, 4, 0, 0, 0, 239,       // RASET (row set) ystart, yend (MSB first)
  0x2C,                        // RAMWR (RAM write)
  0x00 };                      // EOD

static const struct {
    const int      width;   // Width in pixels
    const int      height;  // Height in pixels
    const int      bitrate; // Default stable SPI bitrate
    const uint8_t *init;    // Pointer to initialization command list
    const uint8_t *win;     // Pointer to window command list
} screen = { 240, 240, 80000000, initIPS , winIPS };

// Per-eye structure
static struct {
    int        fd;                // SPI file descriptor
    uint16_t  *buf[2];            // Double-buffered eye data 16 BPP
    struct spi_ioc_transfer xfer; // ioctl() transfer struct
} eye[2];

static uint8_t           bufIdx = 0;    // Double-buffering index
static int               bufsiz = 4096; // SPI block xfer size (4K default)
static struct spi_ioc_transfer xfer = {
  .rx_buf        = 0, // ioctl() transfer structure for issuing
  .delay_usecs   = 0, // commands (not pixel data) to both screens.
  .bits_per_word = 8,
  .pad           = 0,
  .tx_nbits      = 0,
  .rx_nbits      = 0,
  .cs_change     = 0 };

struct gpiod_line *reset_line;
struct gpiod_line *dc_line;

// UTILITY FUNCTIONS -------------------------------------------------------

#define COMMAND 0 // Values for last argument
#define DATA    1 // to dcX2() function below

// Issue data or command to both SPI displays:
static void dcX2(uint8_t x, uint8_t dc) {
    gpiod_line_set_value(dc_line, dc ? 1 : 0);
    xfer.tx_buf = (uintptr_t)&x; // Uses global xfer struct,
    xfer.len    = 1;            // as most elements don't change
    (void)ioctl(eye[0].fd, SPI_IOC_MESSAGE(1), &xfer);
    (void)ioctl(eye[1].fd, SPI_IOC_MESSAGE(1), &xfer);
}

// Issue a list of commands (and arguments, delays) to both displays:
static void commandList(const uint8_t *ptr) { // pass in -> command list
    int i, j, ms;
    for(i=0; (j=ptr[i++]);) {                // 0 = EOD
        dcX2(j, COMMAND);                // First byte = command
        j  = ptr[i++];                   // Delay flag | num args
        ms = j & 0x80;                   // Mask delay flag
        j &= ~0x80;                      // Mask arg count
        while(j--) dcX2(ptr[i++], DATA); // Issue args (data)
        if(ms) {                         // Delay flag set?
            ms = ptr[i++];           // Next byte = milliseconds
            if(ms == 255) ms = 500;  // If 255, make it 500
            usleep(ms * 1000);
        }
    }
}

// Crude error handler (prints message, exits program with status code)
static int err(int code, char *string) {
    (void)puts(string);
    exit(code);
}

// INIT AND MAIN LOOP ------------------------------------------------------

int main(int argc, char *argv[]) {

    int     bitrate   = 0, // If 0, use default
            i, j,
            ret;
    struct gpiod_chip *chip;

    if(!bitrate) bitrate = screen.bitrate;

    // Get SPI buffer size from sysfs.  Default is 4K.
    FILE *fp;
    if((fp = fopen("/sys/module/spidev/parameters/bufsiz", "r"))) {
        if(fscanf(fp, "%d", &i) == 1) bufsiz = i;
        fclose(fp);
    }

    // GPIO AND SCREEN INIT --------------------------------------------

    bool is_pi5 = false;
    if((fp = fopen("/dev/gpiochip4", "r"))) {   // Pi 5 uses gpiochip4, the older ones use gpiochip0
        is_pi5 = true;
        fclose(fp);
    }
    char *chipname = "gpiochip0";
    if (is_pi5) {
        chipname = "gpiochip4";
    }

    chip = gpiod_chip_open_by_name(chipname);
    if (!chip) {
        err(1, "Open chip failed\n");
    }

    dc_line = gpiod_chip_get_line(chip, DC_PIN);
    if (!dc_line) {
        gpiod_chip_close(chip);
        err(2, "Get dc line failed\n");
    }

    ret = gpiod_line_request_output(dc_line, CONSUMER, 0);
    if (ret < 0) {
        gpiod_line_release(dc_line);
        gpiod_chip_close(chip);
        err(3, "Setting dc line as output failed\n");
    }

    reset_line = gpiod_chip_get_line(chip, RESET_PIN);
    if (!reset_line) {
        gpiod_line_release(dc_line);
        gpiod_chip_close(chip);
        err(4, "Get reset line failed\n");
    }

    ret = gpiod_line_request_output(reset_line, CONSUMER, 0);
    if (ret < 0) {
        gpiod_line_release(reset_line);
        gpiod_line_release(dc_line);
        gpiod_chip_close(chip);
        err(5, "Setting reset line as output failed\n");
    }

    if((eye[0].fd = open("/dev/spidev0.0", O_WRONLY|O_NONBLOCK)) < 0)
        err(6, "Can't open spidev0.0, is SPI enabled?");
    if((eye[1].fd = open("/dev/spidev1.2", O_WRONLY|O_NONBLOCK)) < 0)
        err(7, "Can't open spidev1.2, is spi1-3cs overlay enabled?");

    xfer.speed_hz = bitrate;
    uint8_t  mode = SPI_MODE_0;
    for(i=0; i<2; i++) {
        ioctl(eye[i].fd, SPI_IOC_WR_MODE, &mode);
        ioctl(eye[i].fd, SPI_IOC_WR_MAX_SPEED_HZ, bitrate);
        memcpy(&eye[i].xfer, &xfer, sizeof(xfer));
        for(j=0; j<2; j++) {
            if(NULL == (eye[i].buf[j] = (uint16_t *)malloc(
              screen.width *
              screen.height * sizeof(uint16_t)))) {
                err(5, "Eye buffer malloc failed");
            }
        }
    }

    // INITIALIZE SPI SCREENS ------------------------------------------
    gpiod_line_set_value(reset_line, 1); usleep(5); // Reset high,
    gpiod_line_set_value(reset_line, 0); usleep(5); // low,
    gpiod_line_set_value(reset_line, 1); usleep(5); // high

    commandList(screen.init); // Send init commands

    // SPI Display Test -----------------------------------------------
    // Draw red, green, blue to both displays in a loop, waiting 250ms in between

    uint16_t colors[] = {
        __builtin_bswap16(0xF800),
        __builtin_bswap16(0x07F0),
        __builtin_bswap16(0x001F)
    };

    for(;;) {
        for(j=0; j<3; j++) {
            for(i=0; i<2; i++) {
                uint16_t *buffer = eye[i].buf[bufIdx];
                for (int y = 0; y < screen.height; y++) {
                    for (int x = 0; x < screen.width; x++) {
                        buffer[x + y * screen.width] = colors[j];
                    }
                }

                commandList(screen.win);  // Set window
                gpiod_line_set_value(dc_line, 1);     // Data
                uint32_t bytesThisPass, bytesToGo, screenBytes =
                screen.width * screen.height * 2;

                eye[i].xfer.tx_buf = (uintptr_t)eye[i].buf[bufIdx];
                bytesToGo = screenBytes;
                do {
                    bytesThisPass = bytesToGo;
                    if(bytesThisPass > bufsiz) bytesThisPass = bufsiz;
                    eye[i].xfer.len = bytesThisPass;
                    ioctl(eye[i].fd, SPI_IOC_MESSAGE(1), &eye[i].xfer);
                    eye[i].xfer.tx_buf += bytesThisPass;
                    bytesToGo          -= bytesThisPass;
                } while(bytesToGo > 0);
            }
            usleep(250 * 1000);
        }
    }

    gpiod_line_release(reset_line);
    gpiod_line_release(dc_line);
    gpiod_chip_close(chip);

    return 0;
}
pelwell commented 6 months ago

Thanks for that, Melissa. Running it on Pi 4 and Pi 5 the only differences I see are in the timing - Pi 4 runs the bus at 62.5MHz, Pi 5 at 50MHz - and that Pi 5 holds MOSI at the previous value between bytes while Pi 4 returns it to low. Other than that, the traces look the same, with SPI0.0 getting about 112KB of data first, followed by SPI1.2.

Here's Pi 4: screentest_pi4

And Pi 5: screentest_pi5

So it sounds like my Pi 5 is not showing the same behaviour as yours. Yes there is no hardware attached, but it's all transmit-only so it shouldn't matter unless some external load/pull is interfering.

makermelissa commented 6 months ago

Ok, I'll take another look. Starting from a fresh SD might fix it. Also, I could try a second Pi 5 I have to see if it's just one of my Pis misbehaving.

pelwell commented 6 months ago

In case you find that things are still failing, another option is to run the SPI interfaces without DMA. If you replace the standard Pi 5 dtb file (in /boot/firmware) with this one (https://drive.google.com/file/d/1NGJsJoeFf_K8mVi0lNjO6FIPEOYj9lvf/view?usp=drive_link) you'll gain a new dtparam - nospidma, which does what you would expect.