Open olifre opened 4 years ago
Trying the commands:
# vcgencmd measure_clock pllb
frequency(100)=3000000000
# vcgencmd measure_clock arm
frequency(48)=750199232
seems to work fine in that situation.
I grepped through the logs and found the following, potentially related messages:
[567278.869559] xhci_hcd 0000:01:00.0: Ring expansion failed
[567278.869581] hub 1-1.2.4:1.0: activate --> -12
[567294.010078] xhci_hcd 0000:01:00.0: Ring expansion failed
[567294.010100] hub 1-1.2.4:1.0: activate --> -12
[567309.150218] xhci_hcd 0000:01:00.0: Ring expansion failed
[567309.150331] hub 1-1.2.4:1.0: activate --> -12
So I add my usb device tree here in addition in case it helps:
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 480M
|__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
The hub is a self-powered USB 2 hub with a USB 3 HDD attached to it (I plan to replace the hub with a USB 3 capable model in the future).
The HDD access is still fine, though.
After reading through #3040 which might be related(?), I have now added:
coherent_pool=4M
to my /boot/cmdline
and rebooted. Other suggestions welcome. Will report back latest in ~2 weeks if the problem did not reappear, and earlier if it does.
Hi @olifre,
the -12 means ENOMEM and this is a typical cases when you running out of memory. I assume there is at least one memory leak. In case you are able to rebuild your kernel, try to enable CONFIG_DEBUG_KMEMLEAK in the kernel config under kernel hacking -> memory debugging. After that you should be able to see the leaks in /sys/kernel/debug/kmemleak.
HI @lategoodbye ,
thanks! I wondered whether this is a leak or "just" fragmentation within the coherent memory region. I'm a bit astonished this would only hit me, though, since there is no special hardware involved.
While I regularly compile my own kernel on Desktops, I've not done so on Raspbian / Debian yet. I'm already stuck at the point where to get the upstream kernel config and Raspbian patches (if these exist) to build "the same" kernel with just a single flag changed — the docs at https://www.raspberrypi.org/documentation/linux/kernel/building.md only mention *_defconfig
which will likely yield a different configuration (i.e. I would test with something different than what I use now).
So if somebody can help me out here, I can do it, but I would presume the issue is reproducible with any USB 2 hub + USB 3 mass storage device connected to a USB 3 port of the RPi 4.
The docs you linked to are how the standard raspbian kernel is built. bcm2711_defconfig is the config used on a pi 4.
@popcornmix Thanks, compiling now. Hope it'll be straightforward to go back to the upstream built kernel (i.e."just upgrade").
Will report back once I found something.
I wonder if:
CONFIG_DEBUG_KMEMLEAK=Y
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=Y
has any performance implications (if not, it might be helpful to have that by default in case these issues appear regularly).
Modified kernel is running, at least there is no immediate leak visible (even when triggering an active scan
). Guess I'll have to wait a few days.
Or maybe it is "just" fragmentation after all, and compaction not working well enough?
I have by now executed all the workloads I execute on the device, including SMART tests of the attached HDD, running hd-idle
, running fstrim
on the SD card, running a full borg backup from the SD card to the HDD, producing some CPU load and IO load etc.
No leak seen yet. Any other ideas, or should we just wait for ~a week to see if anything changes?
@lategoodbye ,
it's 5 days now and kmemleak
still shows up empty. So I think we can already exclude a leak, right?
Since I increased to coherent_pool=4M
, if it is fragmentation of the coherent region or something similar, the issue might not show up at all anymore, or take months to appear.
Any other ideas on how to debug? If not, I would revert to a stock kernel (the kmemleak
overhead is noticeable) keeping coherent_pool=4M
and switch to my new USB 3 hub. Both these changes may hide the issue, so if there's anything else you would like to see, let me know.
@lategoodbye , the issue is now back after 23 days of uptime, so the increase to coherent_pool=4M
seems to have delayed it.
However:
sudo cat /sys/kernel/debug/kmemleak
gives me only empty output.
Any ideas? Can I reboot the system and revert to a "stock" kernel, or do you have anything you'd like to get extracted?
Since there were still no leaks and no further ideas, I've now:
coherent_pool
to 8M
. I'll report back if it happens again, or in a month if it does not. Any other ideas welcome.
I had the same issue. Logs being filled with the exact same messages. Filled my SD with logs.
Raspberry Pi 4 B 4GB
---
Raspberry Pi reference 2020-02-05
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, e577677b623b577f2a0ec7cfaffc3c27da005da3, stage2
---
Jan 30 2020 13:50:45
Copyright (c) 2012 Broadcom
version 8f792e011c8a0ea3c5d47e7cc10172cc10b93c09 (clean) (release) (start)
---
Linux rpi4 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
In case it helps:
I have a WiFi dongle in a USB 2 port, and an external hard drive in a USB 3 port. The only change I've made to the RPI itself is that I've disabled the built in WiFi via dtoverlay=disable-wifi
Everything else is stock. Raspbian Lite 2020-02-05 iso. Non official 4a PSU.
I'm trying the coherent_pool
flag with 8M
as @olifre suggested, I'll report back if it happens again.
I'm also getting these errors all of a sudden, will report back if I find anything out.
seeing this on a Raspberry Pi 4 Model B Rev 1.1 with a modified kernel
raspberrypi-clk: Failed to get pllb frequency: -12
raspberrypi-clk: Failed to change pllb frequency: -12
xhci_hcd 0000:01:00.0: Ring expansion failed
Linux test 5.4.38 #0 SMP Tue May 5 07:14:40 2020 aarch64 GNU/Linux
eeprom update maybe worth trying? i dont have any usb devices plugged in
rpi-eeprom-update
BCM2711 detected
*** UPDATE REQUIRED ***
BOOTLOADER: update required
CURRENT: Tue Sep 10 10:41:50 UTC 2019 (1568112110)
LATEST: Thu Mar 19 14:27:25 UTC 2020 (1584628045)
FW DIR: /lib/firmware/raspberrypi/bootloader/beta
VL805: update required
CURRENT: 000137ab
LATEST: 000137ad
I'm currently running with coherent_pool=8M
and am up to 45 days of uptime without encountering this issue again.
Current info (apart from the high coherent_pool
size):
4.19.97-v7l+
(vanilla Raspbian again)000137ad
EEPROM version, when opening the thread, I was on an older version (the release notes do not seem to suggest significant USB changes there, though). Interestingly, with this hub / firmware / Kernel, I regularly get USB disconnects and reconnects when the drive goes to standby and comes back. Maybe this helps with the issue?
i rebuilt to 5.4.39 with CONFIG_DEBUG_KMEMLEAK=Y also updated eeprom to 000137ad
oot@test:~# cat /boot/bcm2711-rpi-4-b.dtb | grep pool coherent_pool=1M 8250.nr_uarts=1 cma=64M
24 hours in haven't seen this error yet
I'm currently running with
coherent_pool=8M
and am up to 45 days of uptime without encountering this issue again.
It came back now after 48 days. Since I often reboot more often for updates anyways, coherent_pool=8M
mostly hides the issue for me by sufficiently delaying it...
and its back
Mon May 11 03:04:48 2020 kern.err kernel: [75908.914506] xhci_hcd 0000:01:00.0: Ring expansion failed Mon May 11 03:04:48 2020 kern.err kernel: [75908.919835] hub 1-1:1.0: activate --> -12
root@test:~# cat /sys/kernel/debug/kmemleak
root@test:~#
nothing there but i have a feeling i'm not using these tool correctly
root@test:~# mount | grep debug
nodev on /sys/kernel/debug type debugfs (rw,relatime)
root@test:~# echo scan > /sys/kernel/debug/kmemleak
ash: write error: Operation not permitted
5.4.51-v7l+
has now entered stable. I wonder if the issue will go away with that, updated just now.
still present when using 5.4.52 aarch64 GNU/Linux (patched kernel with mptcp on OpenWRT) edit: my firmware is out of date. going to update and retry.
this might be related https://github.com/raspberrypi/rpi-eeprom/commit/d7b1a49a8356b09384817d362cbfb572e50770a8
## 2020-07-31 Standardize USB port power control accross board revisions - BETA
* Turn off USB power power for 1-second regardless of boot-mode. This appears
to resolve an issue on R1.3 and older board revisions where some USB
devices would fail upon reboot. On R1.4 USB port power is turned off
automatically by the PMIC so this is just held in reset for longer. For
earlier board revisions the USB port power is explicitly turned off via
XHCI.
This can be overriden via USB_MSD_PWR_OFF_TIME in the EEPROM config.
* Update the to latest Broadcom memsys FW - no significant functional change.
logs still being flooded on 5.4.69 using this firmware
BCM2711 detected Dedicated VL805 EEPROM detected BOOTLOADER: up-to-date CURRENT: Thu Sep 3 12:11:43 UTC 2020 (1599135103) LATEST: Thu Sep 3 12:11:43 UTC 2020 (1599135103) FW DIR: /lib/firmware/raspberrypi/bootloader/stable VL805: up-to-date CURRENT: 000138a1 LATEST: 000138a1
Tue Oct 20 23:08:54 2020 kern.err kernel: [675644.083552] xhci_hcd 0000:01:00.0: Ring expansion failed Tue Oct 20 23:08:54 2020 kern.err kernel: [675644.088965] hub 1-1:1.0: activate --> -12
I am seeing this too.
Same here!
pi@raspberrypi:/etc $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@raspberrypi:/etc $ uname -a
Linux raspberrypi 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux
pi@raspberrypi:/etc $
I have the same problem
pi@domotica:~ $ cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs" pi@domotica:~ $ uname -a Linux domotica 5.15.68-v7l+ #1587 SMP Tue Sep 20 11:17:01 BST 2022 armv7l GNU/Linux pi@domotica:~ $
You seem to be running buster (which is deprecated) with a latest (rpi-update?) kernel. You would be advised to update to bullseye.
What does vcgencmd version
report?
Describe the bug After anything between 1-2 weeks of runtime, kernel log is flooded with messages such as:
The system still runs normally, though.
To reproduce Let the system run for 1-2 weeks. No overclocking, no cooling issues, official power supply, no extra loads attached, no preceding kernel messages.
System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:
cat /etc/rpi-issue
)?vcgencmd version
)?uname -a
)?Logs See report above.
Additional context Sadly, no real trigger leading up to the issue. Reboot solves it.