SPI speed ~2x slower than it should be on RPi 4

gtrainavicius commented 4 years ago

The SPI transfer speed used on RPi4 is 2 times slower (50kHz) than requested for example here: https://github.com/raspberrypi/linux/blob/rpi-4.19.y/sound/soc/bcm/pisound.c#L329 (100kHz)

On RPi4, requesting 100000 Hz produces 50kHz communcation speed. Changing the requested speed to 200000 Hz, on RPi4 it first starts at 80kHz, then at some point it switches to 100kHz, so that means RPi4 is capable of using exactly 100kHz.

On RPi3B+, requesting 100000 yields 62.5kHz speed. Requesting 200000 gives 125kHz. I assume RPi3B+ does not have in-between speeds like RPi4.

The question (and bug) is - why does RPi4 pick 50kHz when requested 100kHz instead, even though it can be seen being capable of 80kHz and 100kHz speeds.

To reproduce Without any hat connected, you may run spidev_test to test drive /dev/spidev.0.0. Either an oscilloscope or logic analyzer can be connected to SCK line to measure the frequency, or it can very roughly be inferred based on the reported data rate and the duration of execution. Here are test results of sending 80000 bytes in 16 byte transfers using 100kHz and 200kHz requested speeds on RPi4 and 3B+, they contain SPI logic analyzer timed output, as well as the spidev_test invocation, results and additional notes at the very bottom.

spi_test_results.zip

100kHz RPi4

``` pi@raspberrypi:~/spidev_test $ time { ./spidev_test -S 16 -I 5000 -D /dev/spidev0.0 -s 100000; } spi mode: 0x0 bits per word: 8 max speed: 100000 Hz (100 KHz) rate: tx 41.1kbps, rx 41.1kbps rate: tx 41.8kbps, rx 41.8kbps rate: tx 41.8kbps, rx 41.8kbps total: tx 78.1KB, rx 78.1KB real 0m18.392s user 0m0.056s sys 0m0.286s Actual transfer rate: 4351.44 B/s Theoretical 50kHz rate: 6250 B/s Theoretical 100kHz rate: 12500 B/s Logic analyzer measured SCK rate: 40kHz ```

100kHz RPi3B+

``` pi@raspberrypi:~/spidev_test $ time { ./spidev_test -S 16 -I 5000 -D /dev/spidev0.0 -s 100000; } spi mode: 0x0 bits per word: 8 max speed: 100000 Hz (100 KHz) rate: tx 54.4kbps, rx 54.4kbps rate: tx 64.8kbps, rx 64.8kbps total: tx 78.1KB, rx 78.1KB real 0m11.838s user 0m0.050s sys 0m0.214s Actual transfer rate: 6763.58 B/s Theoretical 50kHz rate: 6250 B/s Theoretical 100kHz rate: 12500 B/s Logic analyzer measured SCK rate: 62.5kHz ```

200kHz RPi4

``` pi@raspberrypi:~/spidev_test $ time { ./spidev_test -S 16 -I 5000 -D /dev/spidev0.0 -s 200000; } spi mode: 0x0 bits per word: 8 max speed: 200000 Hz (200 KHz) rate: tx 69.1kbps, rx 69.1kbps total: tx 78.1KB, rx 78.1KB real 0m9.369s user 0m0.040s sys 0m0.312s Actual transfer rate: 8544.44 B/s Theoretical 50kHz rate: 6250 B/s Theoretical 100kHz rate: 12500 B/s Logic analyzer measured SCK rate: 80kHz ```

200kHz RPi3B+

``` pi@raspberrypi:~/spidev_test $ time { ./spidev_test -S 16 -I 5000 -D /dev/spidev0.0 -s 200000; } spi mode: 0x0 bits per word: 8 max speed: 200000 Hz (200 KHz) rate: tx 125.1kbps, rx 125.1kbps total: tx 78.1KB, rx 78.1KB real 0m6.120s user 0m0.045s sys 0m0.232s Actual transfer rate: 13086.32 B/s Theoretical 50kHz rate: 6250 B/s Theoretical 100kHz rate: 12500 B/s Logic analyzer measured SCK rate: 125kHz ```

As can be seen from the results of same command being executed on RPi3B+ and RPi4, the tests finish significantly faster on RPi3B+.

Expected behaviour 100kHz gets picked when requesting 100kHz speed, or at least something closer like 80kHz.

System RPi 3B+ and RPi 4 using latest Raspbian Lite running:

Linux raspberrypi 4.19.88-v7+ #1284 SMP Wed Dec 11 13:46:41 GMT 2019 armv7l GNU/Linux

popcornmix commented 4 years ago

The SPI clock is a divided down version of the core clock. The core clock reduces when arm is not busy.

If you want to force the full speed of SPI you need to about core clock from clocking down. There are a few ways of doing this: In config.txt

force_turbo=1

or

core_freq_min=500
core_freq=500

Or from the arm:

sudo sh -c "echo performance > cpu0/cpufreq/scaling_governor" .

gtrainavicius commented 4 years ago

Is 50 kHz the best it can do when slowed down? In all cases when running the test, the system was otherwise idle - if requesting 200kHz, the system managed to run SPI at 80kHz, this seems like a bug - it should be able to get something closer to 100kHz when being asked for 100kHz...

popcornmix commented 4 years ago

The core clock changes outside of the spi driver's knowledge. Therefore it assumes the highest frequency and will run slower when the core runs slower.

Typically core_freq=500 core_freq_min=200. So when idle you will get 2/5 of the SPI frequency you requested.

pelwell commented 4 years ago

The SPI interfaces (and I2C and UART1) share the same clock as the VPU cores; if the VPU clock frequency changes, so does the SPI clock. The Linux SPI driver is unaware of these clock changes, so to avoid a bus speed which is too high the clock divisor is calculated for the turbo speed, but when running at the normal speed the divisor is too high with the result that the bus clock is too slow. Locking the VPU/core clock to a fixed value allows the divisor to be correctly calculated without affecting the ARM clock speeds.

edo1 commented 4 years ago

What are default frequencies for RPi 3B+ and RPi 4? I use gpu_freq=250, is it enough to have stable SPI frequency on PRi 3B+? Or I have to specify core_freq_min as well? What about RPi 4?

Does force_turbo=1 affect system stability/overheating?

popcornmix commented 4 years ago

Pi4 uses 200MHz for and and all older Pi's 250MHz.

gpu_freq=250 will have stable frequency on Pi0-3.

Does force_turbo=1 affect system stability/overheating?

You'll have marginally higher temperatures due to the higher clock when idle, and no change when arm is busy.

gtrainavicius commented 4 years ago

I see, I assumed the communication or some other drivers would be handling clock changes not to impact the communication too much.

Is there something that can be added to a device tree overlay EEPROM to ensure a minimum clock speed, so we can make sure that the SPI communication is within minimum and maximum allowable range?

pelwell commented 4 years ago

See https://github.com/raspberrypi/firmware/issues/1308.

marckleinebudde commented 4 years ago

FYI: there are two functions that might be of interest here. You can register a notifier for clock and/or cpu frequenctychange https://elixir.bootlin.com/linux/v5.5.7/source/drivers/cpufreq/cpufreq.c#L341 https://elixir.bootlin.com/linux/v5.5.7/source/drivers/clk/clk.c#L4136

lurch commented 4 years ago

@marckleinebudde I suspect those are for monitoring CPU-side clock changes, whereas the issue here is GPU-side clock changes (since the SPI, I2C and UART1 peripherals are running off the VPU clock rather than the CPU clock, as @pelwell describes above).

marckleinebudde commented 4 years ago

@lurch I'm not familiar with the raspi clock tree, but if the GPU-side clock is not mapped to the linux clock framework, this doesn't work.

pelwell commented 4 years ago

The raspberrypi-clk driver sends all requests to the VPU/GPU, so it's well aware of what is changing.

David00 commented 3 years ago

I am also seeing about half the SPI sampling rate on my Pi 4 when compared against my Pi 3B+.

I have tried to implement the suggestions made by @popcornmix on my Pi 4 running Raspberry Pi OS (kernel 5.4.83-v7l+), and they don't seem to have any impact on the sampling rate.

The Pi docs suggest setting the cpu_freq is not supported on the Pi 4, but they don't mention much about force_turbo=1 on the Pi 4 specifically. I tried both, and neither improve my sampling rate.

Any other ideas?

lurch commented 3 years ago

Pi 4 running Raspberry Pi OS (kernel 5.4.83-v7l+)

Raspberry Pi OS is now using a 5.10 kernel... http://downloads.raspberrypi.org/raspios_armhf/release_notes.txt (I've got no idea if this fixes things, but perhaps it's something you'd like to test?)

David00 commented 3 years ago

I upgraded to 5.10.17-v7l+ and my SPI rates are the same. Thanks for trying, though!

David00 commented 3 years ago

FYI, I continued to do some more testing on my Pi 4B. Both kernel version 5.4.x and 5.10.x exhibited the same issue with SPI data rates. I went back to using kernel version 4.19 and the issue is gone. For the record, using force_turbo or core_freq and core_freq_min did not work for me.

For future readers, assuming this issue goes unresolved...

UPDATE: (June, 2022) This may no longer be a suitable workaround if you have a newer Pi with a newer bootloader. I've had trouble getting the newer bootloaders to boot 4.x kernels, but I can't find any documentation about bootloader/kernel support.

The command (on a Raspberry Pi running Raspberry Pi OS) to install a specific kernel is:

sudo rpi-update <hash>

... where \<hash> is the commit hash from the following GitHub repository that correlates to the specific kernel version you want to install:

https://github.com/Hexxeh/rpi-firmware

So, to install v4.19.118, the command is: sudo rpi-update e1050e94821a70b2e4c72b318d6c6c968552e9a2

Simply press y at the prompt, and then reboot your Pi.

UKHKPaul commented 2 years ago

I have been experimenting with SPI driving NeopIxels from RPI mostly using pi zero and didn’t really see this issue. But when I tried on a pi 4 I had the same issue with the pulses being too slow.

same cause the cpu clock was slowing ( much wider range on pi4 over piZero). In my case as it’s my own SPI driver I did a simple fix of checking the cpu speed before starting the write, and used the speed to recalculate the effective speed to drive the SPI. i.e. at full speed I use a 1:1 ratio, it as the clock speed drops I used a nominally higher speed.

in practice I also found that it helped to check the speed at the end, and if it changed then I just did a retry.

in my case I drive 20+ NeopIxels with no observable issues.

note I also found that a pull down resistor on the SPI pin from the pi helped with stray cases.

aharish879 commented 2 years ago

The SPI clock is a divided down version of the core clock. The core clock reduces when arm is not busy.

If you want to force the full speed of SPI you need to about core clock from clocking down. There are a few ways of doing this: In config.txt
force_turbo=1
or
core_freq_min=500
core_freq=500
Or from the arm:
sudo sh -c "echo performance > cpu0/cpufreq/scaling_governor" .
Hi, We are trying to send 68byte chunks continuously from user space to kernel space(spidev.c) in raspberry pi 4B board, there time interval between two chunks is approximately 50us, we want to reduce this turnaround time so that we can achieve higher data rates. We have tried above spi full speed optimization methods, still we are not observing lower turnaround times. Could you please suggest any optimizations. Thank you in advance.

5ft24dave commented 2 years ago

Has this been addressed in the recent 5.15.44 kernel?

aharish879 commented 2 years ago

Hi, Thank you for reply, I have used 5.10.x Kernel Version

janvanhulzen commented 2 years ago

I have used this test on my Raspberry Pi 4 (4Gb) with OS (Debian version: 11 (bullseye)) and it seems to work as intended. I have tested spidev0.0 at 100kHz and 1 MH SPI_SLCK_1MHz SPI_SLCK_50kHz SPI_SLCK_100kHz z

David00 commented 2 years ago

@janvanhulzen, thanks for sharing the pics with everyone. (Additional context about your setup, and specific kernel version would be helpful)

I just ran a test on the latest Raspberry Pi OS Lite build (Sept 6 '22), kernel 5.15.61-v7l+, using spidev_test.c, on a Pi 4.

Without modifying the clock settings, the SPI clock is all over the place. I would check the CPU frequency immediately before running an spidev test at 1MHz and I can clearly see correlation between CPU clock speed and SPI clock speed.

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq ; sudo ./test -I 10000 -D /dev/spidev0.0 -s 1000000 -I 10

(where test is my output binary after compiling spidev_test.c)

The results of the SPI clock speed, as measured by my scope on the SCLK pin, at various CPU clocks are as follows:

CPU: 600 kHz
SPI: 355 kHz

CPU: 700 kHz
SPI: 404 kHz

CPU: 900 kHz
SPI: 404 kHz

CPU: 1.8 GHz
SPI: 888 kHz

You can see that the final result where the CPU clock is at its peak is the only result remotely near the requested 1 MHz SPI clock.

So, I tested with force_turbo=1 as @popcornmix recommended a while ago, and this gave me consistent results of 1.8 GHz CPU clock and a 888 kHz SPI clock. I also measured the total power consumption of the Pi and it increased about 100mW on average after adding force_turbo=1.

The C driver for the SPI interface demonstrates, at least, that kernel 5.15.61-v7l+ is fine.

In my application that uses the Python spidev library, my sample rates are still about half as fast compared to the sample rate on a v4 kernel, but the SPI clock as measured on the wire has no issue, so there is a problem higher up in the stack. At least now I'm pretty confident that it is not due to the kernel.

gtrainavicius commented 10 months ago

I notice that on Pi 5 the SPI speed seems to be constant, regardless of CPU scaling and current CPU frequency, while Pi 4 and earlier models keep changing the speeds.

Is it possible for a kernel module to know that it is running on a Pi 5? (Or have a device tree overlay dedicated for Pi 5 and the rest of the Pis which would pass a param to the kernel module? (in particular, I'd like to pass the SPI baud rate to use for different models))

rdpoor commented 8 months ago

FWIW, I'm observing what I presume is the same issue on an RPi4 running Debian 1:6.1.63-1+rpt1 (2023-11-24).

From a python script, using the spidev package, I can start the SPI SCK at 50 MHz, but about six seconds later, it drops down to 20MHz. Needless to say, I didn't expect this behavior.

Here's the CPU info in case it is useful to anyone:

$ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU(s) scaling MHz:  40%
    CPU max MHz:         1500.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

popcornmix commented 8 months ago

@rdpoor the solutions are any one of the following: switch to performance governor:

echo performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Boost the minimum core frequency. Add to config.txt

core_freq_min=500

Force turbo. Add to config.txt

force_turbo=1

raspberrypi / linux

SPI speed ~2x slower than it should be on RPi 4 #3381

For future readers, assuming this issue goes unresolved...

UPDATE: (June, 2022) This may no longer be a suitable workaround if you have a newer Pi with a newer bootloader. I've had trouble getting the newer bootloaders to boot 4.x kernels, but I can't find any documentation about bootloader/kernel support.