Closed graysky2 closed 1 month ago
Make sure to disable systemd-pstore.service if you want all logs from pstore. Other ramoops collection software may work better (I haven't tested that), but at least with systemd-pstore I always lost parts of or all ramoops logs. Getting the logs yourself from /sys/fs/pstore/ is easy, but please note that you have to archive them yourself.
Just tested your command sequence on my Pi 4B with armv8 kernel 5.15.84-v8+ (Raspberry Pi OS 64-bit) and it worked fine except the usual flipped bits in the log which may cause relevant parts of the log to have misleading information.
To get reliable non-corrupt logs you may have to enable ECC for ramoops. The current dtoverlay doesn't support that. I plan to send a patch once local testing is complete.
What's the output on your system for
dmesg|grep ramoops
Also copied setup instructions from https://github.com/raspberrypi/linux/issues/5063#issuecomment-1167966601
sudo grep "" /sys/module/ramoops/parameters/*
/sys/module/ramoops/parameters/console_size:16384
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/ftrace_size:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/record_size:16384
and
dmesg|grep ramoops
[ 0.042319] printk: console [ramoops-1] enabled
[ 0.042335] pstore: Registered ramoops as persistent store backend
[ 0.042347] ramoops: using 0x10000@0xb000000, ecc: 0
Also tried the config.txt
line dtoverlay=ramoops,console-size=0x4000,ecc=1,dump_oops=1
to get ecc
and dump_oops
working, but that didn't do anything...
(This is on the 6.1.9-v8+
kernel.)
mount | grep pstore
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
sudo ls /sys/fs/pstore/
is empty.
sudo modprobe configs
zcat /proc/config.gz | grep PSTORE
# CONFIG_EFI_VARS_PSTORE is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
CONFIG_PSTORE_CONSOLE=y
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=y
# CONFIG_PSTORE_BLK is not set
This also didn't help:
echo Y > /sys/module/printk/parameters/always_kmsg_dump
echo Y > /sys/module/kernel/parameters/crash_kexec_post_notifiers
Nor did setting kernel.printk = 7 3 4 1 3
in /etc/sysctl.d/98-rpi.conf
...
With the current kernel and device-tree, the ecc parameter is not functional. It will work neither in cmdline.txt nor in config.txt. That said, ramoops should still work.
I haven't tested kernel 6.1 yet, so there may be problems I'm not yet aware of. Will test after the weekend.
Even with kernel 6.1.9-v8+ from "rpi-update next" ramoops works for me. The only obvious (there may be others) remaining differences with your setup are: I'm running Raspberry Pi OS 64-bit, you're running Arch 32-bit. I have not tested if that dtoverlay even works with 32-bit kernels.
It might make sense to also check with rpi-eeprom-update which SPI bootloader you're using. Please note that after a poweroff, /sys/fs/pstore/ will be empty. Modifying ecc settings manually might also interfere with ramoops.
Not running in 32-bit. (OP is... I'm running in 64-bit mode...)
uname -m
aarch64
Not sure what rpi-eeprom-update is supposed to show?
sudo rpi-eeprom-update -a
BOOTLOADER: up to date
CURRENT: Wed 11 Jan 2023 05:40:52 PM UTC (1673458852)
LATEST: Wed 04 Jan 2023 10:27:49 AM UTC (1672828069)
RELEASE: beta (/lib/firmware/raspberrypi/bootloader/beta)
Use raspi-config to change the release.
VL805_FW: Dedicated VL805 EEPROM
VL805: up to date
CURRENT: 000138c0
LATEST: 000138c0
How are you getting the kernel messages?
These are the current module settings:
sudo grep "" /sys/module/ramoops/parameters/*
/sys/module/ramoops/parameters/console_size:16384
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/ftrace_size:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/record_size:16384
I do a soft reboot with sudo reboot
and I'm not seeing anything via sudo ls /sys/fs/pstore/
.
@satmandu have you made sure that systemd-pstore never runs? It is enabled by default.
After a normal reboot (no crash), I have these files: root@pi-test-1:~# ls -l /sys/fs/pstore/ total 0 -r--r--r-- 1 root root 16372 Aug 7 15:25 console-ramoops-0
The only line for ramoops in my config.txt is: dtoverlay=ramoops,console-size=0x4000
That works well for me.
@satmandu have you made sure that systemd-pstore never runs? It is enabled by default.
Thanks! That was my issue.
Just to reiterate the steps I needed to get this working with the rpi 6.1.x kernel in both rpi-os 64-bit and ubuntu:
systemctl disable systemd-pstore
# Ensure correct kernel.printk set in /etc/sysctl.d/98-rpi.conf
cat /etc/sysctl.d/98-rpi.conf
kernel.printk = 7 3 4 1 3
set config.txt
to have dtoverlay=ramoops,console-size=0x4000
One could also optionally add the following to /etc/rc.local:
echo Y > /sys/module/printk/parameters/always_kmsg_dump
(Maybe at some point, you might be willing to ask the RPI folks to adjust the default kernel.printk
setting? And maybe also we could try to get systemd's pstore daemon fixed too?)
Thanks so much for helping with this. @graysky2 I hope you can get it working on the 32-bit kernel...
Back then, I adjusted the kernel.printk setting because I read it somewhere on the internet. However, most kernel messages (until systemd takes over) are available even with the default kernel.printk settings. Will experiment some more because I'm going to use ramoops in my little (~600 devices) fleet in production soon.
most kernel messages (until systemd takes over) are available
The kernel command line (cmdline.txt) parameter ignore_loglevel
prints all kernel messages to the console, even after systemd has started.
Hello, I am having trouble with this now on a raspberry pi 4b:
Linux mhs-por-dev-unset 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux
I have disabled systemd-pstore.
config.txt modifications:
enable_uart=1
dtoverlay=ramoops,console-size=0x4000
cmdline.txt:
console=serial0,115200 console=tty1 root=PARTUUID=bd05bf36-02 rootfstype=ext4 fsck.repair=yes rootwait
If I do
sysctl kernel.panic=10
echo c | sudo tee /proc/sysrq-trigger
I do see a panic on the uart, but there is also an error (ENOSPC ?) and the pstore is empty after the device reboots.
[ 63.708118] sysrq: Trigger a crash
[ 63.711618] Kernel panic - not syncing: sysrq triggered crash
[ 63.717463] CPU: 3 PID: 866 Comm: tee Tainted: G C 6.6.20+rpt-rpi-v8 #1 Debian 1:6.6.20-1+rpt1
[ 63.727726] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[ 63.733658] Call trace:
[ 63.736147] dump_backtrace+0xa0/0x100
[ 63.739975] show_stack+0x20/0x38
[ 63.743352] dump_stack_lvl+0x48/0x60
[ 63.747084] dump_stack+0x18/0x28
[ 63.750460] panic+0x328/0x390
[ 63.753577] sysrq_handle_crash+0x24/0x30
[ 63.757663] __handle_sysrq+0xb8/0x1e8
[ 63.761482] write_sysrq_trigger+0x7c/0xb0
[ 63.765656] proc_reg_write+0xa4/0x100
[ 63.769480] vfs_write+0xcc/0x310
[ 63.772855] ksys_write+0x78/0x118
[ 63.776317] __arm64_sys_write+0x24/0x38
[ 63.780304] invoke_syscall+0x50/0x128
[ 63.784105] el0_svc_common.constprop.0+0x48/0xf0
[ 63.788875] do_el0_svc+0x24/0x38
[ 63.792234] el0_svc+0x40/0xe8
[ 63.795330] el0t_64_sync_handler+0x100/0x130
[ 63.799747] el0t_64_sync+0x190/0x198
[ 63.803458] SMP: stopping secondary CPUs
[ 63.807435] Kernel Offset: 0x1812a00000 from 0xffffffc080000000
[ 63.813438] PHYS_OFFSET: 0x0
[ 63.816354] CPU features: 0x0,80000201,3c020000,0000421b
[ 63.821740] Memory Limit: none
[ 63.827967] pstore: backend (ramoops) writing error (-28)
[ 63.833446] Rebooting in 10 seconds..
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
If instead I do a normal sudo shutdown -r now
I do see the console save in pstore: console-ramoops-0
pi@mhs-por-dev-unset:~ $ dmesg | grep pstore\\\|oops
[ 0.000000] OF: reserved mem: 0x000000000b000000..0x000000000b00ffff (64 KiB) map non-reusable ramoops@b000000
[ 0.038929] pstore: Using crash dump compression: deflate
[ 0.038953] printk: console [ramoops-1] enabled
[ 0.039434] pstore: Registered ramoops as persistent store backend
[ 0.039452] ramoops: using 0x10000@0xb000000, ecc: 0
pi@mhs-por-dev-unset:~ $ sudo grep -r . /sys/module/pstore/parameters/
/sys/module/pstore/parameters/update_ms:-1
/sys/module/pstore/parameters/kmsg_bytes:10240
/sys/module/pstore/parameters/backend:ramoops
/sys/module/pstore/parameters/compress:deflate
pi@mhs-por-dev-unset:~ $ sudo grep -r . /sys/module/ramoops/parameters/
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/record_size:16384
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/console_size:16384
/sys/module/ramoops/parameters/ftrace_size:0
More strangely I did see a crash dump the first time I enabled this - I had not disabled systemd-pstore, I had only added dtoverlay=ramoops
, I beleive everything else was the same. I did see the ENOSPC error on the uart that time, but I did not take any more notes (because it worked).
Hi, I'm also having trouble with ramoops but only on Raspberry Pi 4B.
config.txt
enable_uart=1
dtoverlay=ramoops-pi4
confirmed that ramoops is enabled:
root@pi4btw:/sys/fs/pstore# dmesg | grep ramoops
[ 0.047397] pstore: Registered ramoops as persistent store backend
[ 0.047424] ramoops: using 0x10000@0xb000000, ecc: 0
crashed each Pis with these commands:
echo 10 > /proc/sys/kernel/panic
echo c > /proc/sysrq-trigger
I did the same testing with Pi 3B (with dtoverlay=ramoops
), Pi 4B and Pi 5 and only Pi 4B fails to write on /sys/fs/pstore/. On the other hand, all Pis succeed in writing for non panic logs in /sys/fs/pstore/console-ramoops-0
when dtoverlay=ramoops,console-size=0x4000
. Is this a hardware issue on Pi 4? I find it weird that it's only Pi 4 that's failing on the same software.
I tried this in the latest release of bookworm (kernel 6.6.47+rpt-rpi-v8) and bullseye (kernel 6.1.21-v8+).
By the way, Pi 5 seem to fail to automatically load ramoops-pi4.dtbo
when config.txt is set to dtoverlay=ramoops
.
EDIT: finished writing comment, accidentally pressed comment button before finish writing
Are you sure it never works? I sometimes get output, but on the flip side I find reboot logs equally unreliable.
With an instrumented kernel, this is the output after a few reboots:
[ 0.038638] ramoops: found existing invalid buffer, size 0, start 2097152 (43474244)
[ 0.038677] ramoops: no valid data in buffer (sig = 0x43470204)
[ 0.038707] ramoops: found existing empty buffer (43474244)
[ 0.038732] ramoops: no valid data in buffer (sig = 0x43464244)
[ 0.038675] ramoops: no valid data in buffer (sig = 0x41454264)
[ 0.038707] ramoops: no valid data in buffer (sig = 0x53470200)
[ 0.038737] ramoops: no valid data in buffer (sig = 0xc3474244)
[ 0.038762] ramoops: no valid data in buffer (sig = 0x53464244)
and after a forced panic:
[ 0.038640] ramoops: no valid data in buffer (sig = 0x41474244)
[ 0.038672] ramoops: no valid data in buffer (sig = 0x53470200)
[ 0.038702] ramoops: found existing invalid buffer, size 1073741824, start 0 (43474244)
[ 0.038733] ramoops: no valid data in buffer (sig = 0x53064244)
[
And for comparison, this is after a Pi 5 panic:
[ 0.013438] ramoops: found existing buffer, size 8602, start 8602 (43474244)
[ 0.013466] ramoops: found existing empty buffer (43474244)
[ 0.013471] ramoops: found existing empty buffer (43474244)
[ 0.013477] ramoops: found existing empty buffer (43474244)
The hex numbers in parentheses are the signatures, which should be 43474244
(DBGC
). You'll see that many of the entries have one or more bit errors, which suggests that the RAM content is just not being maintained.
The thinking here is that the period of SDRAM controller calibration may stall refresh cycles for too long, but we're not sure why Pi 5 doesn't seem to suffer in the same way.
By the way, Pi 5 seem to fail to automatically load
ramoops-pi4.dtbo
when config.txt is set todtoverlay=ramoops
Indeed - that will be fixed.
And now it is - see 9557336c4fc4ac4606ac9e78c239aa689c26e870.
Are you sure it never works? I sometimes get output, but on the flip side I find reboot logs equally unreliable.
Yes, at least from my testing with Pi 4s, regular reboot seem to leave the logs quite consistently while panic logs are never there.
May I know how I can see the logs for ramoops like these?
[ 0.038640] ramoops: no valid data in buffer (sig = 0x41474244)
[ 0.038672] ramoops: no valid data in buffer (sig = 0x53470200)
[ 0.038702] ramoops: found existing invalid buffer, size 1073741824, start 0 (43474244)
[ 0.038733] ramoops: no valid data in buffer (sig = 0x53064244)
EDIT: added quotes
I'm on Raspberry Pi 4 Model B Rev 1.5
if that matters in any way.
Maybe I may not have rebooted the device enough times to make reboot ramoops unreliable.
Running sudo rpi-update pulls/6391
will install a trial kernel with the added ramoops instrumentation, which will give us a better picture of whether RAM contents are decaying.
More test results:
Ignoring the results from the first rev 1.2 (of dubious provenance - it had a gold sticker on), everything prior to the rev 1.5 seems to work with ramoops. Presumably something significant changed in the transition to the dual Dialog PMICs.
If that's a Pi from my drawer, the gold sticker on string tag means "Golden Version", so should be good.
I try to stay away from your drawers
Update: I'm seeing the SDRAM power rail drop on a crash reboot, but not on a normal reboot. This is because the panic bypasses the normal kernel shutdown, including returning the SD card voltage to 3.3V and power-cycling it. This can interfere with normal booting, so the firmware plays it safe and forces a global reset. We don't understand why the RAM contents survives on the older revisions but not the 1.5, but the 1.5 does use a different PMIC.
Fortunately, 4Bs since the 1.3 have the ability to power-cycle just the SD card, meaning that on those boards the global reset can be replaced by a card power-cycle and a watchdog reset (this is what the kernel would normally do). I'll put together a test EEPROM image with that functionality and see if ramoops then works.
Make sure to disable systemd-pstore.service if you want all logs from pstore. Other ramoops collection software may work better (I haven't tested that), but at least with systemd-pstore I always lost parts of or all ramoops logs. Getting the logs yourself from /sys/fs/pstore/ is easy.
By the way, if you enable too much ECC (128 bytes or more), at least some versions of the kernel will hang on boot when parsing the crash log. ecc=64 or less is safe and helpful, though.
rpi-eeprom-recovery.zip Attached is a trial build of a Pi 4 EEPROM image that uses a soft reboot to recover from incorrect SD card voltage on boards with an SD power switch. This should make ramoops completely reliable on your rev 1.5 Pi 4.
Extract the contents onto a blank SD card, and boot with it inserted.
rpi-eeprom-recovery.zip Attached is a trial build of a Pi 4 EEPROM image that uses a soft reboot to recover from incorrect SD card voltage on boards with an SD power switch. This should make ramoops completely reliable on your rev 1.5 Pi 4.
Extract the contents onto a blank SD card, and boot with it inserted.
Will this also work on rev 1.3 and rev 1.4? Would love to test.
Yes - the image should work on all Pi 4s, it's just that the problem only seemed to affect (or is much worse on) rev 1.5s.
By the way, I've not noticed any additional unreliability being introduced by systemd.
Test scenario 1: Pi 4B rev 1.4. Raspberry Pi OS Bullseye, kernel 5.15.84-v8+. No auto-reboot on panic (i.e. no panic= kernel parameter). Active watchdog. Crashing the system with echo c >/proc/sysrq-trigger
Test scenario 2: Pi 4B rev 1.4. Raspberry Pi OS Bullseye, kernel 5.15.84-v8+. Active auto-reboot on panic (i.e. panic=5 kernel parameter). Active watchdog, but not triggering due to earlier automatic reboot by the kernel. Crashing the system with echo c >/proc/sysrq-trigger
That EEPROM version works really well in both scenarios on a Pi which had a pretty unreliable to non-working ramoops before. Thank you!
However, shorting GLOBAL_EN to GND for a really short time (a reset button in some of my installations) will still lose the pstore contents which used to mostly survive (lots of bit errors, but still readable) in the past. That may be dependent on other factors (power supply, time elapsed before RAM training, SD card speed, moon phase...) and is not something I expect to be solved now.
That EEPROM version works really well in both scenarios
Cool - the patched has been merged, and will be in future EEPROM releases.
However, shorting GLOBAL_EN to GND for a really short time (a reset button in some of my installations) will still lose the pstore contents which used to mostly survive
I'm curious about the highlighted phrase (my emphasis) - are you referring to older board revisions, or are you suggesting that a software change has in some way made the SDRAM contents more volatile?
Cool - the patched has been merged, and will be in future EEPROM releases.
Awesome, thank you!
However, shorting GLOBAL_EN to GND for a really short time (a reset button in some of my installations) will still lose the pstore contents which used to mostly survive
I'm curious about the highlighted phrase (my emphasis) - are you referring to older board revisions, or are you suggesting that a software change has in some way made the SDRAM contents more volatile?
It used to work fine (well, with ECC enabled to correct bit flips) on this specific Pi (my bread-and butter lab Pi), but after some time it stopped working. I didn't try the reset button test on any of the other 700-odd Pi 4B in my fleet. I tried restoring the EEPROM and the raspberrypi-kernel/raspberrypi-firmware packages to the exact combinations which had been working previously, but never got it to work again. So yes, something seems to have made RAM contents more volatile, but I'm at a loss at guessing what it could be. Thinking again, the only variables I didn't control for were the physical SD card itself (slower/faster time from power-on to readable contents, possibly causing RAM init to be delayed a bit more) and arm_boost=1 which may or may not have been active when it worked.
By the way, if you enable too much ECC (128 bytes or more), at least some versions of the kernel will hang on boot when parsing the crash log. ecc=64 or less is safe and helpful, though.
Sorry to go on a slight tangent, but is there any documentation anywhere on this ecc
kernel command line flag?
(I have ECC ram on some of my x86_64 hardware, but was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards.)
I can't account for an apparent increased volatility, other than to speculate about the effects of higher temperature, etc.
I'm going to close this now. The content will still be visible (and you'll still be able to discuss ecc etc.), but given the age of the Issue I think it would be better to open a new one if problems reappear.
[I] was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards
This will be software ECC, generated by the kernel to warn about and/or protect against bit errors across the reboot.
[I] was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards
This will be software ECC, generated by the kernel to warn about and/or protect against bit errors across the reboot.
Indeed, and it only covers the pstore RAM region to persist dmesg and oops/panic logs across reboots/resets. That ecc option also can not be set on the kernel command line, you have to set it in the ramoops dtoverlay.
[I] was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards
This will be software ECC, generated by the kernel to warn about and/or protect against bit errors across the reboot.
Indeed, and it only covers the pstore RAM region to persist dmesg and oops/panic logs across reboots/resets. That ecc option also can not be set on the kernel command line, you have to set it in the ramoops dtoverlay.
Ah, ok, now I see what you're talking about.
It looks like one can use kernel parameters depending upon which kernels are being used?
https://www.kernel.org/doc/Documentation/admin-guide/ramoops.rst suggests that mem_address
should be used for the software ECC region size, and that ecc
has turned into a Boolean...
(But of course I imagine that you have more leeway if you're setting this using the DT overlay.)
Indeed, and it only covers the pstore RAM region to persist dmesg and oops/panic logs across reboots/resets. That ecc option also can not be set on the kernel command line, you have to set it in the ramoops dtoverlay.
https://www.kernel.org/doc/Documentation/admin-guide/ramoops.rst suggests that
mem_address
should be used for the software ECC region size, and thatecc
has turned into a Boolean...
No, and no. ecc has not been a bool since 2012.
rpi-eeprom-recovery.zip Attached is a trial build of a Pi 4 EEPROM image that uses a soft reboot to recover from incorrect SD card voltage on boards with an SD power switch. This should make ramoops completely reliable on your rev 1.5 Pi 4.
Extract the contents onto a blank SD card, and boot with it inserted.
awesome! this fixed ramoops on my Pi 4 rev 1.5s as well! thanks!
Describe the bug
No data are written to
/sys/fs/pstore/
when a test kernel panic is triggered.Steps to reproduce the behaviour
dtoverlay=ramoops
to/boot/config.txt
and rebootecho 10 > /proc/sys/kernel/panic
echo c > /proc/sysrq-trigger
The kernel panics but no data are written to
/sys/fs/pstore/
, upon rebooting:Device (s)
Raspberry Pi 4 Mod. B
System
5.15.84-1-rpi-ARCH #1 SMP Mon Dec 19 13:37:50 MST 2022 armv7l GNU/Linux
Logs
No response
Additional context