raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.16k stars 5.01k forks source link

Unable to get ramoops working #5298

Closed graysky2 closed 1 month ago

graysky2 commented 1 year ago

Describe the bug

No data are written to /sys/fs/pstore/ when a test kernel panic is triggered.

Steps to reproduce the behaviour

  1. Add dtoverlay=ramoops to /boot/config.txt and reboot
  2. echo 10 > /proc/sys/kernel/panic
  3. echo c > /proc/sysrq-trigger

The kernel panics but no data are written to /sys/fs/pstore/, upon rebooting:

# tree /var/lib/systemd/pstore/ /sys/fs/pstore
/var/lib/systemd/pstore/
/sys/fs/pstore

0 directories, 0 files

Device (s)

Raspberry Pi 4 Mod. B

System

Logs

No response

Additional context

# cat /boot/config.txt

display_auto_detect=1
dtoverlay=ramoops
arm_boost=1

% zgrep PSTORE /proc/config.gz
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
CONFIG_PSTORE_CONSOLE=y
# CONFIG_PSTORE_PMSG is not set
CONFIG_PSTORE_RAM=y
# CONFIG_PSTORE_BLK is not set

% dmesg | grep -E 'pstore|ramoops'
[  +0.000371] pstore: Registered ramoops as persistent store backend
[  +0.000022] ramoops: using 0x10000@0xb000000, ecc: 0
[  +0.000528] pstore: Using crash dump compression: deflate
[  +0.003103] systemd[1]: Starting Load Kernel Module efi_pstore...
[  +0.001190] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
[  +0.000439] systemd[1]: Finished Load Kernel Module efi_pstore.
[  +0.000178] systemd[1]: Platform Persistent Storage Archival was skipped because of an unmet condition check (ConditionDirectoryNotEmpty=/sys/fs/pstore).

# grep "" /sys/module/ramoops/parameters/*
/sys/module/ramoops/parameters/console_size:0
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/ftrace_size:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/record_size:16384
hailfinger commented 1 year ago

Make sure to disable systemd-pstore.service if you want all logs from pstore. Other ramoops collection software may work better (I haven't tested that), but at least with systemd-pstore I always lost parts of or all ramoops logs. Getting the logs yourself from /sys/fs/pstore/ is easy, but please note that you have to archive them yourself.

Just tested your command sequence on my Pi 4B with armv8 kernel 5.15.84-v8+ (Raspberry Pi OS 64-bit) and it worked fine except the usual flipped bits in the log which may cause relevant parts of the log to have misleading information.

hailfinger commented 1 year ago

To get reliable non-corrupt logs you may have to enable ECC for ramoops. The current dtoverlay doesn't support that. I plan to send a patch once local testing is complete.

hailfinger commented 1 year ago

What's the output on your system for

dmesg|grep ramoops

satmandu commented 1 year ago

Also copied setup instructions from https://github.com/raspberrypi/linux/issues/5063#issuecomment-1167966601

sudo grep "" /sys/module/ramoops/parameters/*
/sys/module/ramoops/parameters/console_size:16384
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/ftrace_size:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/record_size:16384

and

 dmesg|grep ramoops
[    0.042319] printk: console [ramoops-1] enabled
[    0.042335] pstore: Registered ramoops as persistent store backend
[    0.042347] ramoops: using 0x10000@0xb000000, ecc: 0

Also tried the config.txt line dtoverlay=ramoops,console-size=0x4000,ecc=1,dump_oops=1 to get ecc and dump_oops working, but that didn't do anything...

(This is on the 6.1.9-v8+ kernel.)

mount | grep pstore
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)

sudo ls /sys/fs/pstore/ is empty.

sudo modprobe configs
zcat /proc/config.gz | grep PSTORE
# CONFIG_EFI_VARS_PSTORE is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
CONFIG_PSTORE_CONSOLE=y
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=y
# CONFIG_PSTORE_BLK is not set

This also didn't help:

echo Y > /sys/module/printk/parameters/always_kmsg_dump
echo Y > /sys/module/kernel/parameters/crash_kexec_post_notifiers

Nor did setting kernel.printk = 7 3 4 1 3 in /etc/sysctl.d/98-rpi.conf ...

hailfinger commented 1 year ago

With the current kernel and device-tree, the ecc parameter is not functional. It will work neither in cmdline.txt nor in config.txt. That said, ramoops should still work.

I haven't tested kernel 6.1 yet, so there may be problems I'm not yet aware of. Will test after the weekend.

hailfinger commented 1 year ago

Even with kernel 6.1.9-v8+ from "rpi-update next" ramoops works for me. The only obvious (there may be others) remaining differences with your setup are: I'm running Raspberry Pi OS 64-bit, you're running Arch 32-bit. I have not tested if that dtoverlay even works with 32-bit kernels.

It might make sense to also check with rpi-eeprom-update which SPI bootloader you're using. Please note that after a poweroff, /sys/fs/pstore/ will be empty. Modifying ecc settings manually might also interfere with ramoops.

satmandu commented 1 year ago

Not running in 32-bit. (OP is... I'm running in 64-bit mode...)

uname -m
aarch64

Not sure what rpi-eeprom-update is supposed to show?

sudo rpi-eeprom-update -a
BOOTLOADER: up to date
   CURRENT: Wed 11 Jan 2023 05:40:52 PM UTC (1673458852)
    LATEST: Wed 04 Jan 2023 10:27:49 AM UTC (1672828069)
   RELEASE: beta (/lib/firmware/raspberrypi/bootloader/beta)
            Use raspi-config to change the release.

  VL805_FW: Dedicated VL805 EEPROM
     VL805: up to date
   CURRENT: 000138c0
    LATEST: 000138c0

How are you getting the kernel messages?

These are the current module settings:

sudo grep "" /sys/module/ramoops/parameters/*
/sys/module/ramoops/parameters/console_size:16384
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/ftrace_size:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/record_size:16384

I do a soft reboot with sudo reboot and I'm not seeing anything via sudo ls /sys/fs/pstore/.

hailfinger commented 1 year ago

@satmandu have you made sure that systemd-pstore never runs? It is enabled by default.

hailfinger commented 1 year ago

After a normal reboot (no crash), I have these files: root@pi-test-1:~# ls -l /sys/fs/pstore/ total 0 -r--r--r-- 1 root root 16372 Aug 7 15:25 console-ramoops-0

The only line for ramoops in my config.txt is: dtoverlay=ramoops,console-size=0x4000

That works well for me.

satmandu commented 1 year ago

@satmandu have you made sure that systemd-pstore never runs? It is enabled by default.

Thanks! That was my issue.

satmandu commented 1 year ago

Just to reiterate the steps I needed to get this working with the rpi 6.1.x kernel in both rpi-os 64-bit and ubuntu:

systemctl disable systemd-pstore
# Ensure correct kernel.printk set in  /etc/sysctl.d/98-rpi.conf
cat /etc/sysctl.d/98-rpi.conf
kernel.printk = 7 3 4 1 3

set config.txt to have dtoverlay=ramoops,console-size=0x4000

One could also optionally add the following to /etc/rc.local:

echo Y > /sys/module/printk/parameters/always_kmsg_dump

(Maybe at some point, you might be willing to ask the RPI folks to adjust the default kernel.printk setting? And maybe also we could try to get systemd's pstore daemon fixed too?)

Thanks so much for helping with this. @graysky2 I hope you can get it working on the 32-bit kernel...

hailfinger commented 1 year ago

Back then, I adjusted the kernel.printk setting because I read it somewhere on the internet. However, most kernel messages (until systemd takes over) are available even with the default kernel.printk settings. Will experiment some more because I'm going to use ramoops in my little (~600 devices) fleet in production soon.

pelwell commented 1 year ago

most kernel messages (until systemd takes over) are available

The kernel command line (cmdline.txt) parameter ignore_loglevel prints all kernel messages to the console, even after systemd has started.

EvanTheB commented 6 months ago

Hello, I am having trouble with this now on a raspberry pi 4b:

Linux mhs-por-dev-unset 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux

I have disabled systemd-pstore.

config.txt modifications:

enable_uart=1
dtoverlay=ramoops,console-size=0x4000

cmdline.txt:

console=serial0,115200 console=tty1 root=PARTUUID=bd05bf36-02 rootfstype=ext4 fsck.repair=yes rootwait

If I do

sysctl kernel.panic=10
echo c | sudo tee /proc/sysrq-trigger

I do see a panic on the uart, but there is also an error (ENOSPC ?) and the pstore is empty after the device reboots.

[   63.708118] sysrq: Trigger a crash
[   63.711618] Kernel panic - not syncing: sysrq triggered crash
[   63.717463] CPU: 3 PID: 866 Comm: tee Tainted: G         C         6.6.20+rpt-rpi-v8 #1  Debian 1:6.6.20-1+rpt1
[   63.727726] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[   63.733658] Call trace:
[   63.736147]  dump_backtrace+0xa0/0x100
[   63.739975]  show_stack+0x20/0x38
[   63.743352]  dump_stack_lvl+0x48/0x60
[   63.747084]  dump_stack+0x18/0x28
[   63.750460]  panic+0x328/0x390
[   63.753577]  sysrq_handle_crash+0x24/0x30
[   63.757663]  __handle_sysrq+0xb8/0x1e8
[   63.761482]  write_sysrq_trigger+0x7c/0xb0
[   63.765656]  proc_reg_write+0xa4/0x100
[   63.769480]  vfs_write+0xcc/0x310
[   63.772855]  ksys_write+0x78/0x118
[   63.776317]  __arm64_sys_write+0x24/0x38
[   63.780304]  invoke_syscall+0x50/0x128
[   63.784105]  el0_svc_common.constprop.0+0x48/0xf0
[   63.788875]  do_el0_svc+0x24/0x38
[   63.792234]  el0_svc+0x40/0xe8
[   63.795330]  el0t_64_sync_handler+0x100/0x130
[   63.799747]  el0t_64_sync+0x190/0x198
[   63.803458] SMP: stopping secondary CPUs
[   63.807435] Kernel Offset: 0x1812a00000 from 0xffffffc080000000
[   63.813438] PHYS_OFFSET: 0x0
[   63.816354] CPU features: 0x0,80000201,3c020000,0000421b
[   63.821740] Memory Limit: none
[   63.827967] pstore: backend (ramoops) writing error (-28)
[   63.833446] Rebooting in 10 seconds..
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]

If instead I do a normal sudo shutdown -r now I do see the console save in pstore: console-ramoops-0

pi@mhs-por-dev-unset:~ $ dmesg | grep pstore\\\|oops
[    0.000000] OF: reserved mem: 0x000000000b000000..0x000000000b00ffff (64 KiB) map non-reusable ramoops@b000000
[    0.038929] pstore: Using crash dump compression: deflate
[    0.038953] printk: console [ramoops-1] enabled
[    0.039434] pstore: Registered ramoops as persistent store backend
[    0.039452] ramoops: using 0x10000@0xb000000, ecc: 0
pi@mhs-por-dev-unset:~ $ sudo grep -r . /sys/module/pstore/parameters/
/sys/module/pstore/parameters/update_ms:-1
/sys/module/pstore/parameters/kmsg_bytes:10240
/sys/module/pstore/parameters/backend:ramoops
/sys/module/pstore/parameters/compress:deflate
pi@mhs-por-dev-unset:~ $ sudo grep -r . /sys/module/ramoops/parameters/
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/dump_oops:-1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/max_reason:2
/sys/module/ramoops/parameters/record_size:16384
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/console_size:16384
/sys/module/ramoops/parameters/ftrace_size:0

More strangely I did see a crash dump the first time I enabled this - I had not disabled systemd-pstore, I had only added dtoverlay=ramoops, I beleive everything else was the same. I did see the ENOSPC error on the uart that time, but I did not take any more notes (because it worked).

c1neo commented 1 month ago

Hi, I'm also having trouble with ramoops but only on Raspberry Pi 4B.

config.txt

enable_uart=1
dtoverlay=ramoops-pi4

confirmed that ramoops is enabled:

root@pi4btw:/sys/fs/pstore# dmesg | grep ramoops
[    0.047397] pstore: Registered ramoops as persistent store backend
[    0.047424] ramoops: using 0x10000@0xb000000, ecc: 0

crashed each Pis with these commands:

echo 10 > /proc/sys/kernel/panic
echo c > /proc/sysrq-trigger

I did the same testing with Pi 3B (with dtoverlay=ramoops), Pi 4B and Pi 5 and only Pi 4B fails to write on /sys/fs/pstore/. On the other hand, all Pis succeed in writing for non panic logs in /sys/fs/pstore/console-ramoops-0 when dtoverlay=ramoops,console-size=0x4000. Is this a hardware issue on Pi 4? I find it weird that it's only Pi 4 that's failing on the same software.

I tried this in the latest release of bookworm (kernel 6.6.47+rpt-rpi-v8) and bullseye (kernel 6.1.21-v8+).

By the way, Pi 5 seem to fail to automatically load ramoops-pi4.dtbo when config.txt is set to dtoverlay=ramoops.

EDIT: finished writing comment, accidentally pressed comment button before finish writing

pelwell commented 1 month ago

Are you sure it never works? I sometimes get output, but on the flip side I find reboot logs equally unreliable.

With an instrumented kernel, this is the output after a few reboots:

[    0.038638] ramoops: found existing invalid buffer, size 0, start 2097152 (43474244)
[    0.038677] ramoops: no valid data in buffer (sig = 0x43470204)
[    0.038707] ramoops: found existing empty buffer (43474244)
[    0.038732] ramoops: no valid data in buffer (sig = 0x43464244)
[    0.038675] ramoops: no valid data in buffer (sig = 0x41454264)
[    0.038707] ramoops: no valid data in buffer (sig = 0x53470200)
[    0.038737] ramoops: no valid data in buffer (sig = 0xc3474244)
[    0.038762] ramoops: no valid data in buffer (sig = 0x53464244)

and after a forced panic:

[    0.038640] ramoops: no valid data in buffer (sig = 0x41474244)
[    0.038672] ramoops: no valid data in buffer (sig = 0x53470200)
[    0.038702] ramoops: found existing invalid buffer, size 1073741824, start 0 (43474244)
[    0.038733] ramoops: no valid data in buffer (sig = 0x53064244)
[

And for comparison, this is after a Pi 5 panic:

[    0.013438] ramoops: found existing buffer, size 8602, start 8602 (43474244)
[    0.013466] ramoops: found existing empty buffer (43474244)
[    0.013471] ramoops: found existing empty buffer (43474244)
[    0.013477] ramoops: found existing empty buffer (43474244)

The hex numbers in parentheses are the signatures, which should be 43474244 (DBGC). You'll see that many of the entries have one or more bit errors, which suggests that the RAM content is just not being maintained.

The thinking here is that the period of SDRAM controller calibration may stall refresh cycles for too long, but we're not sure why Pi 5 doesn't seem to suffer in the same way.

pelwell commented 1 month ago

By the way, Pi 5 seem to fail to automatically load ramoops-pi4.dtbo when config.txt is set to dtoverlay=ramoops

Indeed - that will be fixed.

pelwell commented 1 month ago

And now it is - see 9557336c4fc4ac4606ac9e78c239aa689c26e870.

c1neo commented 1 month ago

Are you sure it never works? I sometimes get output, but on the flip side I find reboot logs equally unreliable.

Yes, at least from my testing with Pi 4s, regular reboot seem to leave the logs quite consistently while panic logs are never there.

May I know how I can see the logs for ramoops like these?

[    0.038640] ramoops: no valid data in buffer (sig = 0x41474244)
[    0.038672] ramoops: no valid data in buffer (sig = 0x53470200)
[    0.038702] ramoops: found existing invalid buffer, size 1073741824, start 0 (43474244)
[    0.038733] ramoops: no valid data in buffer (sig = 0x53064244)

EDIT: added quotes

c1neo commented 1 month ago

I'm on Raspberry Pi 4 Model B Rev 1.5 if that matters in any way.

c1neo commented 1 month ago

Maybe I may not have rebooted the device enough times to make reboot ramoops unreliable.

pelwell commented 1 month ago

Running sudo rpi-update pulls/6391 will install a trial kernel with the added ramoops instrumentation, which will give us a better picture of whether RAM contents are decaying.

pelwell commented 1 month ago

More test results:

pelwell commented 1 month ago

Ignoring the results from the first rev 1.2 (of dubious provenance - it had a gold sticker on), everything prior to the rev 1.5 seems to work with ramoops. Presumably something significant changed in the transition to the dual Dialog PMICs.

JamesH65 commented 1 month ago

If that's a Pi from my drawer, the gold sticker on string tag means "Golden Version", so should be good.

pelwell commented 1 month ago

I try to stay away from your drawers

pelwell commented 1 month ago

Update: I'm seeing the SDRAM power rail drop on a crash reboot, but not on a normal reboot. This is because the panic bypasses the normal kernel shutdown, including returning the SD card voltage to 3.3V and power-cycling it. This can interfere with normal booting, so the firmware plays it safe and forces a global reset. We don't understand why the RAM contents survives on the older revisions but not the 1.5, but the 1.5 does use a different PMIC.

Fortunately, 4Bs since the 1.3 have the ability to power-cycle just the SD card, meaning that on those boards the global reset can be replaced by a card power-cycle and a watchdog reset (this is what the kernel would normally do). I'll put together a test EEPROM image with that functionality and see if ramoops then works.

hailfinger commented 1 month ago

Make sure to disable systemd-pstore.service if you want all logs from pstore. Other ramoops collection software may work better (I haven't tested that), but at least with systemd-pstore I always lost parts of or all ramoops logs. Getting the logs yourself from /sys/fs/pstore/ is easy.

hailfinger commented 1 month ago

By the way, if you enable too much ECC (128 bytes or more), at least some versions of the kernel will hang on boot when parsing the crash log. ecc=64 or less is safe and helpful, though.

pelwell commented 1 month ago

rpi-eeprom-recovery.zip Attached is a trial build of a Pi 4 EEPROM image that uses a soft reboot to recover from incorrect SD card voltage on boards with an SD power switch. This should make ramoops completely reliable on your rev 1.5 Pi 4.

Extract the contents onto a blank SD card, and boot with it inserted.

hailfinger commented 1 month ago

rpi-eeprom-recovery.zip Attached is a trial build of a Pi 4 EEPROM image that uses a soft reboot to recover from incorrect SD card voltage on boards with an SD power switch. This should make ramoops completely reliable on your rev 1.5 Pi 4.

Extract the contents onto a blank SD card, and boot with it inserted.

Will this also work on rev 1.3 and rev 1.4? Would love to test.

pelwell commented 1 month ago

Yes - the image should work on all Pi 4s, it's just that the problem only seemed to affect (or is much worse on) rev 1.5s.

pelwell commented 1 month ago

By the way, I've not noticed any additional unreliability being introduced by systemd.

hailfinger commented 1 month ago

Test scenario 1: Pi 4B rev 1.4. Raspberry Pi OS Bullseye, kernel 5.15.84-v8+. No auto-reboot on panic (i.e. no panic= kernel parameter). Active watchdog. Crashing the system with echo c >/proc/sysrq-trigger

Test scenario 2: Pi 4B rev 1.4. Raspberry Pi OS Bullseye, kernel 5.15.84-v8+. Active auto-reboot on panic (i.e. panic=5 kernel parameter). Active watchdog, but not triggering due to earlier automatic reboot by the kernel. Crashing the system with echo c >/proc/sysrq-trigger

That EEPROM version works really well in both scenarios on a Pi which had a pretty unreliable to non-working ramoops before. Thank you!

However, shorting GLOBAL_EN to GND for a really short time (a reset button in some of my installations) will still lose the pstore contents which used to mostly survive (lots of bit errors, but still readable) in the past. That may be dependent on other factors (power supply, time elapsed before RAM training, SD card speed, moon phase...) and is not something I expect to be solved now.

pelwell commented 1 month ago

That EEPROM version works really well in both scenarios

Cool - the patched has been merged, and will be in future EEPROM releases.

However, shorting GLOBAL_EN to GND for a really short time (a reset button in some of my installations) will still lose the pstore contents which used to mostly survive

I'm curious about the highlighted phrase (my emphasis) - are you referring to older board revisions, or are you suggesting that a software change has in some way made the SDRAM contents more volatile?

hailfinger commented 1 month ago

Cool - the patched has been merged, and will be in future EEPROM releases.

Awesome, thank you!

However, shorting GLOBAL_EN to GND for a really short time (a reset button in some of my installations) will still lose the pstore contents which used to mostly survive

I'm curious about the highlighted phrase (my emphasis) - are you referring to older board revisions, or are you suggesting that a software change has in some way made the SDRAM contents more volatile?

It used to work fine (well, with ECC enabled to correct bit flips) on this specific Pi (my bread-and butter lab Pi), but after some time it stopped working. I didn't try the reset button test on any of the other 700-odd Pi 4B in my fleet. I tried restoring the EEPROM and the raspberrypi-kernel/raspberrypi-firmware packages to the exact combinations which had been working previously, but never got it to work again. So yes, something seems to have made RAM contents more volatile, but I'm at a loss at guessing what it could be. Thinking again, the only variables I didn't control for were the physical SD card itself (slower/faster time from power-on to readable contents, possibly causing RAM init to be delayed a bit more) and arm_boost=1 which may or may not have been active when it worked.

satmandu commented 1 month ago

By the way, if you enable too much ECC (128 bytes or more), at least some versions of the kernel will hang on boot when parsing the crash log. ecc=64 or less is safe and helpful, though.

Sorry to go on a slight tangent, but is there any documentation anywhere on this ecc kernel command line flag?

(I have ECC ram on some of my x86_64 hardware, but was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards.)

pelwell commented 1 month ago

I can't account for an apparent increased volatility, other than to speculate about the effects of higher temperature, etc.

I'm going to close this now. The content will still be visible (and you'll still be able to discuss ecc etc.), but given the age of the Issue I think it would be better to open a new one if problems reappear.

pelwell commented 1 month ago

[I] was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards

This will be software ECC, generated by the kernel to warn about and/or protect against bit errors across the reboot.

hailfinger commented 1 month ago

[I] was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards

This will be software ECC, generated by the kernel to warn about and/or protect against bit errors across the reboot.

Indeed, and it only covers the pstore RAM region to persist dmesg and oops/panic logs across reboots/resets. That ecc option also can not be set on the kernel command line, you have to set it in the ramoops dtoverlay.

satmandu commented 1 month ago

[I] was unaware that there was a user-configurable ECC option for any of the Raspberry Pi boards

This will be software ECC, generated by the kernel to warn about and/or protect against bit errors across the reboot.

Indeed, and it only covers the pstore RAM region to persist dmesg and oops/panic logs across reboots/resets. That ecc option also can not be set on the kernel command line, you have to set it in the ramoops dtoverlay.

Ah, ok, now I see what you're talking about.

It looks like one can use kernel parameters depending upon which kernels are being used?

https://www.kernel.org/doc/Documentation/admin-guide/ramoops.rst suggests that mem_address should be used for the software ECC region size, and that ecc has turned into a Boolean...

(But of course I imagine that you have more leeway if you're setting this using the DT overlay.)

hailfinger commented 1 month ago

Indeed, and it only covers the pstore RAM region to persist dmesg and oops/panic logs across reboots/resets. That ecc option also can not be set on the kernel command line, you have to set it in the ramoops dtoverlay.

https://www.kernel.org/doc/Documentation/admin-guide/ramoops.rst suggests that mem_address should be used for the software ECC region size, and that ecc has turned into a Boolean...

No, and no. ecc has not been a bool since 2012.

c1neo commented 1 month ago

rpi-eeprom-recovery.zip Attached is a trial build of a Pi 4 EEPROM image that uses a soft reboot to recover from incorrect SD card voltage on boards with an SD power switch. This should make ramoops completely reliable on your rev 1.5 Pi 4.

Extract the contents onto a blank SD card, and boot with it inserted.

awesome! this fixed ramoops on my Pi 4 rev 1.5s as well! thanks!