raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.17k stars 5.01k forks source link

raspberrypi-clk: Failed to get / change pllb frequency: -12 #3479

Open olifre opened 4 years ago

olifre commented 4 years ago

Describe the bug After anything between 1-2 weeks of runtime, kernel log is flooded with messages such as:

[566789.421766] raspberrypi-clk raspberrypi-clk: Failed to change pllb frequency: -12
[566789.421784] raspberrypi-clk raspberrypi-clk: Failed to get pllb frequency: -12
[566790.128946] raspberrypi-clk raspberrypi-clk: Failed to change pllb frequency: -12
[566790.128962] raspberrypi-clk raspberrypi-clk: Failed to get pllb frequency: -12
[566790.340932] raspberrypi-clk raspberrypi-clk: Failed to change pllb frequency: -12
[566790.340948] raspberrypi-clk raspberrypi-clk: Failed to get pllb frequency: -12
[566791.499069] raspberrypi-clk raspberrypi-clk: Failed to change pllb frequency: -12
[566791.499086] raspberrypi-clk raspberrypi-clk: Failed to get pllb frequency: -12

The system still runs normally, though.

To reproduce Let the system run for 1-2 weeks. No overclocking, no cooling issues, official power supply, no extra loads attached, no preceding kernel messages.

System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:

Logs See report above.

Additional context Sadly, no real trigger leading up to the issue. Reboot solves it.

olifre commented 4 years ago

Trying the commands:

# vcgencmd measure_clock pllb
frequency(100)=3000000000
# vcgencmd measure_clock arm
frequency(48)=750199232

seems to work fine in that situation.

olifre commented 4 years ago

I grepped through the logs and found the following, potentially related messages:

[567278.869559] xhci_hcd 0000:01:00.0: Ring expansion failed
[567278.869581] hub 1-1.2.4:1.0: activate --> -12
[567294.010078] xhci_hcd 0000:01:00.0: Ring expansion failed
[567294.010100] hub 1-1.2.4:1.0: activate --> -12
[567309.150218] xhci_hcd 0000:01:00.0: Ring expansion failed
[567309.150331] hub 1-1.2.4:1.0: activate --> -12

So I add my usb device tree here in addition in case it helps:

# lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 480M
            |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M

The hub is a self-powered USB 2 hub with a USB 3 HDD attached to it (I plan to replace the hub with a USB 3 capable model in the future).

The HDD access is still fine, though.

olifre commented 4 years ago

After reading through #3040 which might be related(?), I have now added:

coherent_pool=4M

to my /boot/cmdline and rebooted. Other suggestions welcome. Will report back latest in ~2 weeks if the problem did not reappear, and earlier if it does.

lategoodbye commented 4 years ago

Hi @olifre,

the -12 means ENOMEM and this is a typical cases when you running out of memory. I assume there is at least one memory leak. In case you are able to rebuild your kernel, try to enable CONFIG_DEBUG_KMEMLEAK in the kernel config under kernel hacking -> memory debugging. After that you should be able to see the leaks in /sys/kernel/debug/kmemleak.

olifre commented 4 years ago

HI @lategoodbye ,

thanks! I wondered whether this is a leak or "just" fragmentation within the coherent memory region. I'm a bit astonished this would only hit me, though, since there is no special hardware involved.

While I regularly compile my own kernel on Desktops, I've not done so on Raspbian / Debian yet. I'm already stuck at the point where to get the upstream kernel config and Raspbian patches (if these exist) to build "the same" kernel with just a single flag changed — the docs at https://www.raspberrypi.org/documentation/linux/kernel/building.md only mention *_defconfig which will likely yield a different configuration (i.e. I would test with something different than what I use now).

So if somebody can help me out here, I can do it, but I would presume the issue is reproducible with any USB 2 hub + USB 3 mass storage device connected to a USB 3 port of the RPi 4.

popcornmix commented 4 years ago

The docs you linked to are how the standard raspbian kernel is built. bcm2711_defconfig is the config used on a pi 4.

olifre commented 4 years ago

@popcornmix Thanks, compiling now. Hope it'll be straightforward to go back to the upstream built kernel (i.e."just upgrade").

Will report back once I found something.

I wonder if:

CONFIG_DEBUG_KMEMLEAK=Y
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=Y

has any performance implications (if not, it might be helpful to have that by default in case these issues appear regularly).

olifre commented 4 years ago

Modified kernel is running, at least there is no immediate leak visible (even when triggering an active scan). Guess I'll have to wait a few days. Or maybe it is "just" fragmentation after all, and compaction not working well enough?

olifre commented 4 years ago

I have by now executed all the workloads I execute on the device, including SMART tests of the attached HDD, running hd-idle, running fstrim on the SD card, running a full borg backup from the SD card to the HDD, producing some CPU load and IO load etc.

No leak seen yet. Any other ideas, or should we just wait for ~a week to see if anything changes?

olifre commented 4 years ago

@lategoodbye , it's 5 days now and kmemleak still shows up empty. So I think we can already exclude a leak, right?

Since I increased to coherent_pool=4M, if it is fragmentation of the coherent region or something similar, the issue might not show up at all anymore, or take months to appear.

Any other ideas on how to debug? If not, I would revert to a stock kernel (the kmemleak overhead is noticeable) keeping coherent_pool=4M and switch to my new USB 3 hub. Both these changes may hide the issue, so if there's anything else you would like to see, let me know.

olifre commented 4 years ago

@lategoodbye , the issue is now back after 23 days of uptime, so the increase to coherent_pool=4M seems to have delayed it.

However:

sudo cat /sys/kernel/debug/kmemleak

gives me only empty output.

Any ideas? Can I reboot the system and revert to a "stock" kernel, or do you have anything you'd like to get extracted?

olifre commented 4 years ago

Since there were still no leaks and no further ideas, I've now:

I'll report back if it happens again, or in a month if it does not. Any other ideas welcome.

dustContributor commented 4 years ago

I had the same issue. Logs being filled with the exact same messages. Filled my SD with logs.

Raspberry Pi 4 B 4GB
---
Raspberry Pi reference 2020-02-05
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, e577677b623b577f2a0ec7cfaffc3c27da005da3, stage2
---
Jan 30 2020 13:50:45 
Copyright (c) 2012 Broadcom
version 8f792e011c8a0ea3c5d47e7cc10172cc10b93c09 (clean) (release) (start)
---
Linux rpi4 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

In case it helps:

I have a WiFi dongle in a USB 2 port, and an external hard drive in a USB 3 port. The only change I've made to the RPI itself is that I've disabled the built in WiFi via dtoverlay=disable-wifi

Everything else is stock. Raspbian Lite 2020-02-05 iso. Non official 4a PSU.

I'm trying the coherent_pool flag with 8M as @olifre suggested, I'll report back if it happens again.

lmondoux commented 4 years ago

I'm also getting these errors all of a sudden, will report back if I find anything out.

user747 commented 4 years ago

seeing this on a Raspberry Pi 4 Model B Rev 1.1 with a modified kernel

raspberrypi-clk: Failed to get pllb frequency: -12
raspberrypi-clk: Failed to change pllb frequency: -12

xhci_hcd 0000:01:00.0: Ring expansion failed

Linux test 5.4.38 #0 SMP Tue May 5 07:14:40 2020 aarch64 GNU/Linux

eeprom update maybe worth trying? i dont have any usb devices plugged in

rpi-eeprom-update 
BCM2711 detected
*** UPDATE REQUIRED ***
BOOTLOADER: update required
CURRENT: Tue Sep 10 10:41:50 UTC 2019 (1568112110)
 LATEST: Thu Mar 19 14:27:25 UTC 2020 (1584628045)
 FW DIR: /lib/firmware/raspberrypi/bootloader/beta
VL805: update required
CURRENT: 000137ab
 LATEST: 000137ad
olifre commented 4 years ago

I'm currently running with coherent_pool=8M and am up to 45 days of uptime without encountering this issue again.

Current info (apart from the high coherent_pool size):

Interestingly, with this hub / firmware / Kernel, I regularly get USB disconnects and reconnects when the drive goes to standby and comes back. Maybe this helps with the issue?

user747 commented 4 years ago

i rebuilt to 5.4.39 with CONFIG_DEBUG_KMEMLEAK=Y also updated eeprom to 000137ad

oot@test:~# cat /boot/bcm2711-rpi-4-b.dtb | grep pool coherent_pool=1M 8250.nr_uarts=1 cma=64M

24 hours in haven't seen this error yet

olifre commented 4 years ago

I'm currently running with coherent_pool=8M and am up to 45 days of uptime without encountering this issue again.

It came back now after 48 days. Since I often reboot more often for updates anyways, coherent_pool=8M mostly hides the issue for me by sufficiently delaying it...

user747 commented 4 years ago

and its back

Mon May 11 03:04:48 2020 kern.err kernel: [75908.914506] xhci_hcd 0000:01:00.0: Ring expansion failed Mon May 11 03:04:48 2020 kern.err kernel: [75908.919835] hub 1-1:1.0: activate --> -12

root@test:~# cat /sys/kernel/debug/kmemleak root@test:~#

nothing there but i have a feeling i'm not using these tool correctly

root@test:~# mount | grep debug nodev on /sys/kernel/debug type debugfs (rw,relatime)

root@test:~# echo scan > /sys/kernel/debug/kmemleak ash: write error: Operation not permitted

olifre commented 4 years ago

5.4.51-v7l+ has now entered stable. I wonder if the issue will go away with that, updated just now.

user747 commented 4 years ago

still present when using 5.4.52 aarch64 GNU/Linux (patched kernel with mptcp on OpenWRT) edit: my firmware is out of date. going to update and retry.

user747 commented 4 years ago

this might be related https://github.com/raspberrypi/rpi-eeprom/commit/d7b1a49a8356b09384817d362cbfb572e50770a8

## 2020-07-31 Standardize USB port power control accross board revisions - BETA
   * Turn off USB power power for 1-second regardless of boot-mode. This appears
     to resolve an issue on R1.3 and older board revisions where some USB
     devices would fail upon reboot. On R1.4 USB port power is turned off
     automatically by the PMIC so this is just held in reset for longer. For
     earlier board revisions the USB port power is explicitly turned off via
     XHCI.
     This can be overriden via USB_MSD_PWR_OFF_TIME in the EEPROM config.
   * Update the to latest Broadcom memsys FW - no significant functional change.
user747 commented 4 years ago

logs still being flooded on 5.4.69 using this firmware

BCM2711 detected Dedicated VL805 EEPROM detected BOOTLOADER: up-to-date CURRENT: Thu Sep 3 12:11:43 UTC 2020 (1599135103) LATEST: Thu Sep 3 12:11:43 UTC 2020 (1599135103) FW DIR: /lib/firmware/raspberrypi/bootloader/stable VL805: up-to-date CURRENT: 000138a1 LATEST: 000138a1

Tue Oct 20 23:08:54 2020 kern.err kernel: [675644.083552] xhci_hcd 0000:01:00.0: Ring expansion failed Tue Oct 20 23:08:54 2020 kern.err kernel: [675644.088965] hub 1-1:1.0: activate --> -12

borpin commented 3 years ago

I am seeing this too.

877dev commented 3 years ago

Same here!

pi@raspberrypi:/etc $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@raspberrypi:/etc $ uname -a
Linux raspberrypi 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux
pi@raspberrypi:/etc $
ayasystems commented 2 years ago

I have the same problem

pi@domotica:~ $ cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs" pi@domotica:~ $ uname -a Linux domotica 5.15.68-v7l+ #1587 SMP Tue Sep 20 11:17:01 BST 2022 armv7l GNU/Linux pi@domotica:~ $

popcornmix commented 2 years ago

You seem to be running buster (which is deprecated) with a latest (rpi-update?) kernel. You would be advised to update to bullseye.

What does vcgencmd version report?