Closed ViRb3 closed 1 month ago
Which medium are these Pis booting from?
@pelwell Micro SD card.
What sort of micro sd card?
It's a SanDisk Extreme Pro 128GB on all three of the devices. For what is worth, I also have a bunch of external HDDs attached to each Pi, however those have their own power supply and are connected via a powered hub.
I also use an initramfs with dropbear for rootfs decryption according to https://github.com/ViRb3/pi-encrypted-boot-ssh.
This setup hasn't ever caused issues until yesterday.
Is your debug output above from a serial port? Or rather, do you have a Pi 5-compatible serial cable you can use to capture debug output? If so, try with ignore_loglevel
in cmdline.txt (keeping it to a singe line) - there should be some messages about the SD card, but whatever you get, post it all.
These logs were taken from the HDMI output, I just wrote down the important lines by hand.
I do actually have a debug probe, I hooked it up for the past 2 days, but unfortunately even after 30+ restarts at various times of the day and intervals, I could not reproduce the issue.
I thought maybe it was a fluke, so I restarted my other two Raspberry Pis to test, and surprisingly both of then failed to boot exactly like before. Sadly, those two devices are remote, so I can't attach a probe to them.
I would like to mention, when this issue occurred before, I could enter the initramfs/dropbear environment, but after decrypting the rootfs, the mounting timeout would happen.
In my last attempt on the remote Raspberry Pis, I did apt update && apt upgrade && rpi-eeprom-update -a
, and when I rebooted, the Raspberry Pi never came online - the green light was on, but even initramfs/dropbear was not reachable. A force power cycle fixed it for now.
Sorry I can't provide any concrete details, but since the day the Raspberry Pi 5s have been released, I have been running these 3 in the exact same setup with no issues. So while I can't tell what's wrong, something definitely seems off. Thanks!
Is it possible that you need to regenerate the initramfs following the kernel update?
It auto-regenerates during every update, and I explicitly looked for those lines to make sure it didn't go wrong as it was my first suspect as well. I also did manually regenerate, just to double-check.
I would also like to add, I have updated all 3 devices in this exact way at least 10 times since the Raspberry Pi 5 was released and never experienced an issue.
Here's ls of initrd file:
# ls -alt /boot/initrd*
-rw------- 1 root root 13890585 Sep 13 18:36 /boot/initrd.img-6.6.31+rpt-rpi-2712
-rw------- 1 root root 13888365 Sep 13 18:35 /boot/initrd.img-6.6.31+rpt-rpi-v8
-rw------- 1 root root 14076413 Sep 13 18:35 /boot/initrd.img-6.6.47+rpt-rpi-2712
-rw------- 1 root root 14070518 Sep 13 18:35 /boot/initrd.img-6.6.47+rpt-rpi-v8
Actually, one more thing I forgot to mention, I am running the kernel in 4KB addressing mode instead of 16KB for compatibility reasons with my external disks, which are formatted with BTRFS and 4KB sectors.
One of my remote Raspberry Pi 5s just hung up out of the blue. It did not reboot, the green light was still on, but it was completely unresponsive - could not SSH, no disk activity, etc. After force restarting it, I checked the logs, and there was absolutely nothing since it crashed ~8 hours ago. Last logs were just some routine networking related cron jobs I have running.
I have now downgraded all my devices to the previous kernel, 6.6.31, will report back how it goes. For reference, this is how I did it:
set -e
cd /boot/firmware
mv kernel8.img kernel8.img.bak
mv kernel_2712.img kernel_2712.img.bak
mv initramfs_2712 initramfs_2712.bak
mv initramfs8 initramfs8.bak
cp -a ../vmlinuz-6.6.31+rpt-rpi-2712 kernel_2712.img
cp -a ../vmlinuz-6.6.31+rpt-rpi-v8 kernel8.img
cp -a ../initrd.img-6.6.31+rpt-rpi-2712 initramfs_2712
cp -a ../initrd.img-6.6.31+rpt-rpi-v8 initramfs8
Reference of kernel versions: https://archive.raspberrypi.com/debian/pool/main/l/linux/
Since downgrading, have restarted the servers multiple times and had no issues. No random crashes either. I am therefore very confident that this was due to the kernel upgrade. Sadly, not sure how I can help more.
The major change to the SD card handling has been the addition of support for command queueing. I'm going to list a number of kernel builds in order of incrementing time. For each build, run sudo rpi-update <hash>
, where <hash>
is the 7-digit hexadecimal number on the line. I suggest you start at the top and work your way down until you start to see the problem again.
The builds:
f17defc
- "kernel: Bump to 6.6.22" // before command queueing was enabled
f637152
- "kernel: drivers: mmc: add SD support for Command Queueing"
8ec753c
- "kernel: SD command queueing take 2"
b0bbbab
- "kernel: SD fixes round 3"
I think I have the same issue on my RPi 5 on 6.6.47. OS and all data are on a SanDisk Extreme 256 GB SD card. Sometimes it hangs at boot, but for me it's less than 50% of the time.
Additionally, sometimes during normal system runtime, the whole system suddenly freezes. Because I have a system monitor open at all times, I always see disk I/O of my SD card completely stop when that happens. From one moment to the next there is not a single byte disk I/O to and from the SD card any longer. And from here on the system completely freezes and needs a hard power cycle. For me this happens much more frequently when there is a lot of I/O going on, such as pulling large container images. For example, there is one particular image that's almost 2 GB which I tried to pull dozens of times last week and each time the system froze eventually. It can also happen during normal desktop use, e.g. while browsing with Firefox or using vscode, though less frequently.
I can reasonably rule out a faulty SD card because I bought a new SanDisk Extreme 128 GB card, cloned the OS to it (yes, I did properly downsize partitions) and had the exact same issue with that.
Just now I downgraded to 6.6.22 using rpi-update f17defc
and was able to pull that large image for the first time without freeze. So I have no long-term experience yet, but it looks very promising. Thank you so much for the research. I almost thought my Pi was faulty until I found this issue.
Thanks for chiming in @fshimizu! Happy to hear this is not an isolated case. I have been meaning to troubleshoot more, but sadly have a busy week ahead. I hope I can help more after that.
Can I ask if you have any external disks connected? When you mention that I/O causes the crash, do you mean load on the SD card or any external disk?
@ViRb3 Happy to try and clarify further. No external disks. This Pi is only for experimenting and a bit of desktop use. So there are peripherals and an audio device connected via USB, but the only storage is the SD card. So, all I/O is on the SD card.
you mention that I/O causes the crash
Let me be careful here: I can't really confirm that I/O directly causes the crash. It's just an observation I made that for me it seems to happen much more often during times of heavy I/O. But not exclusively.
I tried to get more info, e.g. by running dmesg
right after the freeze happens, but the command simply hangs forever - that is if the input still works. After a power cycle there seems to be nothing helpful in the logs. I guess if there are any errors logged, they can't be written. Unfortunately I only just started using Pis, so I don't have a serial cable or anything for debugging. But I'll try to provide any info that could help.
Thanks for the update. @pelwell posted above some commit hashes of various major changes in the kernel. Can you follow the instructions and test which one does and doesn't crash? I'm away from my Pis, so I don't dare to touch them as I can't do anything if they hang.
This is certainly not exhaustive but I installed the kernels listed and did the following test procedure, which had previously frozen my system on 6.6.47 reliably.
podman rmi <BIG_IMAGE>
podman pull <BIG_IMAGE>
These are the results.
Test # | Commit | Comment | uname -a | Test result |
---|---|---|---|---|
1 | f17defc | "kernel: Bump to 6.6.22" // before command queueing was enabled | Linux <HOSTNAME> 6.6.22-v8-16k+ #1745 SMP PREEMPT Tue Mar 19 17:57:20 GMT 2024 aarch64 GNU/Linux |
seems stable |
2 | f637152 | "kernel: drivers: mmc: add SD support for Command Queueing" | Linux <HOSTNAME> 6.6.22-v8-16k+ #1746 SMP PREEMPT Wed Mar 20 17:21:57 GMT 2024 aarch64 GNU/Linux |
seems stable |
3 | 8ec753c | "kernel: SD command queueing take 2" | Linux <HOSTNAME> 6.6.22-v8-16k+ #1747 SMP PREEMPT Wed Mar 27 17:06:56 GMT 2024 aarch64 GNU/Linux |
seems stable |
4 | b0bbbab | "kernel: SD fixes round 3" | Linux <HOSTNAME> 6.6.25-v8-16k+ #1752 SMP PREEMPT Wed Apr 10 14:30:55 BST 2024 aarch64 GNU/Linux |
seems stable |
5 | ? | My previous kernel from the distribution | Linux <HOSTNAME> 6.6.47+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux |
Froze at step 1. |
6 | f17defc | "kernel: Bump to 6.6.22" // before command queueing was enabled | Linux <HOSTNAME> 6.6.22-v8-16k+ #1745 SMP PREEMPT Tue Mar 19 17:57:20 GMT 2024 aarch64 GNU/Linux |
seems stable |
Test 5 was a control with the regular kernel installed by the distro which had been freezing for me. On the first try I didn't even get to the main test, pulling the image. The system already froze while just starting some other, smaller containers.
Test 6 was another control to verify that it would work again with the older kernel.
On the version with the crash/freeze, can you add dtparam=sd_cqe=off
to /boot/firmware/config.txt and retest?
@P33M thanks a lot for that suggestion. You might be onto something here. I've done the same test as before, with and without the dtparam=sd_cqe=off
parameter. I.e. two rounds each of pulling a large container image, starting a container from it and deleting it again.
All tests were done with the kernel from the distribution where I initially noticed the issue: Linux <HOSTNAME> 6.6.47+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.47-1+rpt1 (2024-09-02) aarch64 GNU/Linux
Test # | with or without dtparam=sd_cqe=off ? |
result |
---|---|---|
1 | without | froze during image pull, 1st round |
2 | with | seems stable |
3 | without | froze during container creation, 1st round |
4 | with | seems stable |
I will keep using the problematic 6.6.47 kernel with the dtparam=sd_cqe=off
option for now, to test this a bit more long-term.
To hopefully rule out some other factors, here are some details on @ViRb3's setup and my settings in comparison.
I also have a bunch of external HDDs attached
I have no storage other than the SD card.
I also use an initramfs with dropbear for rootfs decryption
I use neither storage encryption nor dropbear.
I am running the kernel in 4KB addressing mode instead of 16KB
I haven't changed that, so I use whatever is the default in a fresh installation.
For each of the cards where boot fails without sd_cqe=off, can you run and post the output of
find /sys/bus/mmc/devices/mmc0\:*/ -type f -not -path "*block*" -print -exec cat {} \;
?
Here you go: device-info.txt
At the moment I only have this one SD card. The other one I had cloned from it previously is already overwritten.
I've managed to hang a card I previously thought was reliable - Sandisk Extreme Pro 64GB with a date of 09/2021. Running fstrim / sync /fio in parallel managed to wedge the card in about 20 minutes.
I can reproduce this on our own cards - it smells a lot like a deadlock, as there are no IO or CQE engine errors but eventually all tasks that were accessing the mmc device report scheduler timeouts.
I was operating under the assumption that async and sync mmc_blk commands had sufficient guards to prevent doing sync while async requests were pending, but perhaps not.
I have a working theory - it's sort of a consequence of limiting posted writes to 1, and a poor implementation of CQHCI's halt mechanism.
When halted, the CQE can stop after it finishes its current task descriptor which defines a contiguous data block transfer. Such a descriptor may be in a chain of descriptors for a large block write associated with a tag (this is key - the pending write needs to be >128K in length). Issuing an SD cache flush command halts the CQE, and the Linux CQHCI driver relies on more requests being submitted to restart the CQE.
Edit: the mmc_blk driver does attempt to avoid issuing a flush while IO is ongoing, by waiting for CQE to be idle, but it does this in process context. There's the possibility of preemption and something restarting CQE by submitting a (write) request.
It follows that if the block layer queue is stalled, either because it is jammed full of writes (that can't get posted due to the 1x write limit), or the OS needs to serialise a critical write and issued the flush along with the write, then activity stops.
There's also another potential hole where halting the CQE prevents automatic card status polling from occurring, so if a critical read task was submitted and a flush occurred before the card reported ready, you also could get wedged.
I can now get CQE to halt cleanly after finishing all its current tasks, but things still fall over because of the MMC framework's multiple interlocking mechanisms for declaring "I'm exclusively using the host".
Now at least I get timeouts attributed to the MMC layer, but forward progress for IO still stops.
Hi, I was reporting problems with CQE and SanDisk Extreme Pro 128GB and SanDisk Extreme 64GB already during the testing. I could not find the cause at that time, but maybe now I found something:
Using swap partition (instead of the default swap file) on the SD card together with CQE results in failed boots in 8/10 cases. Fstab look like:
PARTUUID=5e07cda8-03 none swap sw,discard,pri=10 0 0
No partition swap - system boots with CQE in 10/10 cases, sometimes some individual files are read incorrectly. I mean, there is no error reading the file, but checksums do not match. Disabling CQE cures all the errors.
With the new kernel (6.6.51-1+rpt3 (2024-10-08), which changes the default CQE behavior to off, I sometimes observe CQE recovery in dmesg. Maybe the developers enabled some kind of CQE debugging? This is without the swap partition during checksumming the files on the SD card. I believe that CQE recovery somehow reads the files just fine (no error in checksum), but it is slightly slower.
It looks like this:
[Oct12 23:12] mmc0: cqhci: timeout for tag 30, qcnt 1
[ +0.000006] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[ +0.000002] mmc0: cqhci: Caps: 0x000030c8 | Version: 0x00000510
[ +0.000003] mmc0: cqhci: Config: 0x00000101 | Control: 0x00000000
[ +0.000002] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000016
[ +0.000002] mmc0: cqhci: Int sig: 0x00000016 | Int Coal: 0x00000000
[ +0.000002] mmc0: cqhci: TDL base: 0x05a48000 | TDL up32: 0x00000001
[ +0.000003] mmc0: cqhci: Doorbell: 0x40000000 | TCN: 0x00000000
[ +0.000002] mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x40000000
[ +0.000002] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00004000
[ +0.000002] mmc0: cqhci: SSC2: 0x0000aaaa | DCMD rsp: 0x00000000
[ +0.000001] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000
[ +0.000002] mmc0: cqhci: Resp idx: 0x0000000d | Resp arg: 0x00000000
[ +0.000006] mmc0: running CQE recovery
[Oct12 23:13] mmc0: cqhci: timeout for tag 2, qcnt 1
[ +0.000007] mmc0: cqhci: ============ CQHCI REGISTER DUMP ===========
[ +0.000002] mmc0: cqhci: Caps: 0x000030c8 | Version: 0x00000510
[ +0.000002] mmc0: cqhci: Config: 0x00000101 | Control: 0x00000000
[ +0.000003] mmc0: cqhci: Int stat: 0x00000000 | Int enab: 0x00000016
[ +0.000002] mmc0: cqhci: Int sig: 0x00000016 | Int Coal: 0x00000000
[ +0.000002] mmc0: cqhci: TDL base: 0x05a48000 | TDL up32: 0x00000001
[ +0.000003] mmc0: cqhci: Doorbell: 0x00000004 | TCN: 0x00000000
[ +0.000002] mmc0: cqhci: Dev queue: 0x00000000 | Dev Pend: 0x00000004
[ +0.000002] mmc0: cqhci: Task clr: 0x00000000 | SSC1: 0x00004000
[ +0.000002] mmc0: cqhci: SSC2: 0x0000aaaa | DCMD rsp: 0x00000000
[ +0.000002] mmc0: cqhci: RED mask: 0xfdf9a080 | TERRI: 0x00000000
[ +0.000002] mmc0: cqhci: Resp idx: 0x0000000d | Resp arg: 0x00000000
[ +0.000006] mmc0: running CQE recovery
Does it mean that my SD card is faulty with CQE enebled?
I have just received the new official Raspberry Pi A2 SD card. I'll try that one as well.
The posted write count underflows which stalls IO submission forevermore. I have a working theory as to why - although some members of the mmc_queue struct are accessed under spinlock, other members aren't. If there's a read-modify-write on a member in the same cacheline as a critical variable, then this tramples the data that was "protected". The clue was that the hang typically coincided with an FIO process spawning, and never during one of the IO sequences. I've added lots of gratuitous padding to these struct members and so far no hang...
It still hangs, and the above is incorrect.
In about 30 minutes after auto-build completes, please try testing with sudo rpi-update pulls/6418
, re-enabling sd_cqe in config.txt via dtparam=sd_cqe=on
and rebooting. If you were previously getting cqhci timeout errors with register dumps, this would indicate an incompatible card and will still need to leave sd_cqe off.
Thanks a lot for your efforts, @P33M. I'll test when I get some time. One question ahead of time:
If you were previously getting cqhci timeout errors with register dumps, this would indicate an incompatible card
Is there an easy way how I can check that without any serial cable or other debug setup while testing? I will do the same tests as I did previously in any case. Just wondering if I can get more info out of it, than simply "hangs" or "probably doesn't hang".
After a round of my usual amateurish tests with the test build and with my old kernel from the distro I'm carefully optimistic. dtparam=sd_cqe=on
was set at all times.
I'll keep using the pulls/6418 build with dtparam=sd_cqe=on
from now on and if it should hang I'll report.
In other news: My SD card now gives off a distinct glow in the dark. :-)
I have some good news! I'm still testing the stock 6.6.51-1+rpt3 (2024-10-08) kernel, didn't have time to test the 6418 yet.
I have in total 4 different A2 SD cards, including the new official Raspberry Pi A2 card:
The recovery messages in dmesg are not so frequent. I'm checksumming 5GB of files on the SD card 100 times and I get on average only about 20 recovery messages. So it is 1/5.
On the other hand, the swap partition creates problems almost every time, approximately in 4/5 of the reboots. Now I think it is related to the discard
option. The swap partition without discard
works fine. It looks like the cards 2 and 3 are very sensitive to fstrim.
Fstrim alone works as expected, but fstrim immediately followed by a card read has a fatal effect. (This is probably what happens activating a swap partition with discard
during boot.)
$ sudo fstrim -va;sudo dd if=/dev/mmcblk0 bs=1M of=/dev/null status=progress count=4096
/boot/firmware: 352.8 MiB (369897472 bytes) trimmed on /dev/mmcblk0p1
/: 85.7 MiB (89829376 bytes) trimmed on /dev/mmcblk0p2
bash: /usr/bin/sudo: Input/output error
I'll try to make it more reproducible.
fstrim works for me on a Sandisk 64GB Extreme Pro A2. Lots of these cards should get trim/discard demoted to an erase operation, as there is a pre-existing quirk for broken behaviour. Please provide card 2+3 register information as per the command earlier in this issue, and the output of dmesg | grep mmc0
- there should be a "quirks" line.
If it helps, here is the quirks line of my Sandisk Extreme (non-Pro) A2 256 for which I provided the register info earlier.
[ 0.670245] mmcblk0: mmc0:aaaa SN256 238 GiB (quirks 0x00004000)
Edit: This is on pulls/6418 with dtparam=sd_cqe=on
.
Well, I don't think this is just a trim problem. Actually it is CQE+trim problem. With CQE disabled, trim works just fine on 2 and 3.
Looks like the trim leaves CQE in a bad state, which corrupts the first read just after?
Quirks:
The following test fails on 2 and 3 with CQE enabled, while passes with CQE disabled:
$ sudo fstrim -a;ls
ls: error while loading shared libraries: etjmp: cannot open shared object file: No such file or directory
sometimes there is another error like ls: error while loading shared libraries: chk: cannot open shared object file: No such file or directory
CQE enabled failure rate: 10/10 ! on cards 2 and 3 CQE disabled success rate: 10/10 ! on cards 2 and 3
Card 1 passes the test with CQE enabled. With card 4 I can't enable CQE.
SD card info: 1:
/sys/bus/mmc/devices/mmc0:544c/fwrev
/sys/bus/mmc/devices/mmc0:544c/uevent
DRIVER=mmcblk
MMC_TYPE=SD
MMC_NAME=USD00
MODALIAS=mmc:block
/sys/bus/mmc/devices/mmc0:544c/cid
ad4c5355534430302170084b6aa18800
/sys/bus/mmc/devices/mmc0:544c/rca
0x544c
/sys/bus/mmc/devices/mmc0:544c/ext_power
01
/sys/bus/mmc/devices/mmc0:544c/csd
400e0032db790001d77b7f800a400000
/sys/bus/mmc/devices/mmc0:544c/manfid
0x0000ad
/sys/bus/mmc/devices/mmc0:544c/power/runtime_active_time
6263664
/sys/bus/mmc/devices/mmc0:544c/power/runtime_status
active
/sys/bus/mmc/devices/mmc0:544c/power/autosuspend_delay_ms
3000
/sys/bus/mmc/devices/mmc0:544c/power/runtime_suspended_time
0
/sys/bus/mmc/devices/mmc0:544c/power/control
on
/sys/bus/mmc/devices/mmc0:544c/ocr
0x00300000
/sys/bus/mmc/devices/mmc0:544c/preferred_erase_size
8388608
/sys/bus/mmc/devices/mmc0:544c/ext_perf
18
/sys/bus/mmc/devices/mmc0:544c/type
SD
/sys/bus/mmc/devices/mmc0:544c/hwrev
0x2
/sys/bus/mmc/devices/mmc0:544c/date
08/2024
/sys/bus/mmc/devices/mmc0:544c/dsr
0x404
/sys/bus/mmc/devices/mmc0:544c/erase_size
512
/sys/bus/mmc/devices/mmc0:544c/removable
removable
/sys/bus/mmc/devices/mmc0:544c/oemid
0x4c53
/sys/bus/mmc/devices/mmc0:544c/serial
0x70084b6a
/sys/bus/mmc/devices/mmc0:544c/ssr
00000000080000000400a00100fd3a1e000800000002fc0000000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/bus/mmc/devices/mmc0:544c/scr
0245848700000000
/sys/bus/mmc/devices/mmc0:544c/name
USD00
2:
/sys/bus/mmc/devices/mmc0:aaaa/fwrev
0x0
/sys/bus/mmc/devices/mmc0:aaaa/uevent
DRIVER=mmcblk
MMC_TYPE=SD
MMC_NAME=SF128
MODALIAS=mmc:block
/sys/bus/mmc/devices/mmc0:aaaa/cid
035344534631323880281ba4f0014a00
/sys/bus/mmc/devices/mmc0:aaaa/rca
0xaaaa
/sys/bus/mmc/devices/mmc0:aaaa/ext_power
07
/sys/bus/mmc/devices/mmc0:aaaa/csd
400e0032db790003b8ab7f800a404000
/sys/bus/mmc/devices/mmc0:aaaa/manfid
0x000003
/sys/bus/mmc/devices/mmc0:aaaa/power/runtime_active_time
626219
/sys/bus/mmc/devices/mmc0:aaaa/power/runtime_status
active
/sys/bus/mmc/devices/mmc0:aaaa/power/autosuspend_delay_ms
3000
/sys/bus/mmc/devices/mmc0:aaaa/power/runtime_suspended_time
0
/sys/bus/mmc/devices/mmc0:aaaa/power/control
on
/sys/bus/mmc/devices/mmc0:aaaa/ocr
0x00300000
/sys/bus/mmc/devices/mmc0:aaaa/preferred_erase_size
4194304
/sys/bus/mmc/devices/mmc0:aaaa/ext_perf
18
/sys/bus/mmc/devices/mmc0:aaaa/type
SD
/sys/bus/mmc/devices/mmc0:aaaa/hwrev
0x8
/sys/bus/mmc/devices/mmc0:aaaa/date
10/2020
/sys/bus/mmc/devices/mmc0:aaaa/dsr
0x404
/sys/bus/mmc/devices/mmc0:aaaa/erase_size
512
/sys/bus/mmc/devices/mmc0:aaaa/removable
removable
/sys/bus/mmc/devices/mmc0:aaaa/oemid
0x5344
/sys/bus/mmc/devices/mmc0:aaaa/serial
0x281ba4f0
/sys/bus/mmc/devices/mmc0:aaaa/ssr
0000000008000000040090000f05391e000800000002fc0003000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/bus/mmc/devices/mmc0:aaaa/scr
0245848700000000
/sys/bus/mmc/devices/mmc0:aaaa/name
SF128
3:
/sys/bus/mmc/devices/mmc0:aaaa/fwrev
0x0
/sys/bus/mmc/devices/mmc0:aaaa/uevent
DRIVER=mmcblk
MMC_TYPE=SD
MMC_NAME=SN64G
MODALIAS=mmc:block
/sys/bus/mmc/devices/mmc0:aaaa/cid
035344534e36344780abf1d45c013100
/sys/bus/mmc/devices/mmc0:aaaa/rca
0xaaaa
/sys/bus/mmc/devices/mmc0:aaaa/ext_power
07
/sys/bus/mmc/devices/mmc0:aaaa/csd
400e0032db790001dbd37f800a404000
/sys/bus/mmc/devices/mmc0:aaaa/manfid
0x000003
/sys/bus/mmc/devices/mmc0:aaaa/power/runtime_active_time
3154928
/sys/bus/mmc/devices/mmc0:aaaa/power/runtime_status
active
/sys/bus/mmc/devices/mmc0:aaaa/power/autosuspend_delay_ms
3000
/sys/bus/mmc/devices/mmc0:aaaa/power/runtime_suspended_time
0
/sys/bus/mmc/devices/mmc0:aaaa/power/control
on
/sys/bus/mmc/devices/mmc0:aaaa/ocr
0x00300000
/sys/bus/mmc/devices/mmc0:aaaa/preferred_erase_size
4194304
/sys/bus/mmc/devices/mmc0:aaaa/ext_perf
18
/sys/bus/mmc/devices/mmc0:aaaa/type
SD
/sys/bus/mmc/devices/mmc0:aaaa/hwrev
0x8
/sys/bus/mmc/devices/mmc0:aaaa/date
01/2019
/sys/bus/mmc/devices/mmc0:aaaa/dsr
0x404
/sys/bus/mmc/devices/mmc0:aaaa/erase_size
512
/sys/bus/mmc/devices/mmc0:aaaa/removable
removable
/sys/bus/mmc/devices/mmc0:aaaa/oemid
0x5344
/sys/bus/mmc/devices/mmc0:aaaa/serial
0xabf1d45c
/sys/bus/mmc/devices/mmc0:aaaa/ssr
0000000008000000040090000f05391e000800000002fc0003000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/bus/mmc/devices/mmc0:aaaa/scr
0245848700000000
/sys/bus/mmc/devices/mmc0:aaaa/name
SN64G
4:
/sys/bus/mmc/devices/mmc0:59b4/fwrev
0x0
/sys/bus/mmc/devices/mmc0:59b4/uevent
DRIVER=mmcblk
MMC_TYPE=SD
MMC_NAME=LX128
MODALIAS=mmc:block
/sys/bus/mmc/devices/mmc0:59b4/cid
9c534f4c5831323810298208ca013a00
/sys/bus/mmc/devices/mmc0:59b4/rca
0x59b4
/sys/bus/mmc/devices/mmc0:59b4/ext_power
00
/sys/bus/mmc/devices/mmc0:59b4/csd
400e0032db790003b14f7f800a400000
/sys/bus/mmc/devices/mmc0:59b4/manfid
0x00009c
/sys/bus/mmc/devices/mmc0:59b4/power/runtime_active_time
2027526
/sys/bus/mmc/devices/mmc0:59b4/power/runtime_status
active
/sys/bus/mmc/devices/mmc0:59b4/power/autosuspend_delay_ms
3000
/sys/bus/mmc/devices/mmc0:59b4/power/runtime_suspended_time
0
/sys/bus/mmc/devices/mmc0:59b4/power/control
on
/sys/bus/mmc/devices/mmc0:59b4/ocr
0x00300000
/sys/bus/mmc/devices/mmc0:59b4/preferred_erase_size
8388608
/sys/bus/mmc/devices/mmc0:59b4/ext_perf
08
/sys/bus/mmc/devices/mmc0:59b4/type
SD
/sys/bus/mmc/devices/mmc0:59b4/hwrev
0x1
/sys/bus/mmc/devices/mmc0:59b4/date
10/2019
/sys/bus/mmc/devices/mmc0:59b4/dsr
0x404
/sys/bus/mmc/devices/mmc0:59b4/erase_size
512
/sys/bus/mmc/devices/mmc0:59b4/removable
removable
/sys/bus/mmc/devices/mmc0:59b4/oemid
0x534f
/sys/bus/mmc/devices/mmc0:59b4/serial
0x298208ca
/sys/bus/mmc/devices/mmc0:59b4/ssr
00000000080000000400a00100fd3a1e000800000002fc0000000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/bus/mmc/devices/mmc0:59b4/scr
0245848700000000
/sys/bus/mmc/devices/mmc0:59b4/name
LX128
The mechanism by which fstrim issues discard ops has the same potential failure mode as fsync in the apt kernel (hang with no obvious IO error). When trimming a large filesystem with many free extents, hundreds of trim commands for approx. the preferred erase size are issued back-to-back. Does the rpi-update version fix this?
No, unfortunately pulls/6418 does not fix the problem. I found out that actually the checksum of /usr/bin/ls changes. You can try any other command instead of ls
, it does not matter much.
sudo fstrim -a;ls
$ b3sum /usr/bin/ls
a31d8bd1e4b68d94d38b3a8b73c3167753c0bf2084035fd7722008623086b38b /usr/bin/ls
$ b3sum /usr/bin/ls
049086e725258375365d68b2f1fde51acfc5f9b2b89504305bb372edf61931ad /usr/bin/ls
Actually just the first 64kB of ls
are different and they contain garbage. However, the very first 1.5kB are exactly the same. It is strange.
Cards 2&3 are Sandisk models that fall within the range of dates for cards with known-bad flush behaviour. So I would expect them to be unreliable anyway.
However, I have a card that is newer than the newest card with known-bad flush behaviour - Sandisk Extreme Pro A2 64GB, that does not have the cache flush issue, but does get hosed if fstrim is invoked.
md5sum sd.test.file && sudo fstrim -v -a && md5sum sd.test.file
7bf1afb705be6f885946144deb881bef sd.test.file
/media/pi/bootfs: 407.7 MiB (427452416 bytes) trimmed on /dev/mmcblk0p1
/media/pi/rootfs: 41.8 GiB (44896120832 bytes) trimmed on /dev/mmcblk0p2
/boot/firmware: 360.2 MiB (377747456 bytes) trimmed on /dev/nvme0n1p1
/: 443.2 GiB (475885969408 bytes) trimmed on /dev/nvme0n1p2
7bf1afb705be6f885946144deb881bef sd.test.file
sd.test.file
/media/pi/bootfs: 407.4 MiB (427190272 bytes) tri[ 144.220728] EXT4-fs error (device mmcblk0p2): ext4_validate_block_bitmap:421: comm fstrim: bg 1: bad block bitmap checksum
mmed on /dev/mmcblk0p1
[ 144.245357] EXT4-fs (mmcblk0p2): Remounting filesystem read-only
/media/pi/rootfs: 0 B (0 bytes) trimmed on /dev/mmcblk0p2
/boot/firmware: 360.2 MiB (377747456 bytes) trimmed on /dev/nvme0n1p1
/: 0 B (0 bytes) trimmed on /dev/nvme0n1p2
md5sum: sd.test.file: Input/output error
In the garbled line where the kernel splat was printed to the console, I did echo 3 | sudo tee /proc/sys/vm/drop_caches
- so the first read was probably ext4 metadata.
This is a separate failure mode from the OP, so please create a new issue.
Actually this is CQE related, but due to a clumsy interaction between the discard quirk and the spec.
CMD38 is supported in CQ mode, but only with a Discard op. The quirk (applied to all Sandisk cards) demotes Discard to Erase, which presumably results in undefined behaviour. If I check for the correct option and disable CQ otherwise, the Sandisk card no longer returns bogus data.
Pull request updated, and in about 45 minutes the autobulds will complete. Re-run rpi-update (delete /boot/firmware/.firmware_revision first).
I've tried version 15645c8e..., which gives kernel 6.6.56-v8-16k+ #1805 SMP PREEMPT Tue Oct 15 20:06:35 BST 2024, but the problem persists.
The current state-of-the-art is sudo rpi-update pulls/6419
.
That fixes all errors on cards 2 and 3. I tested all the cases. Good job, thanks!
@P33M thanks so much for your work! When can we expect this to arrive in regular apt upgrade
?
CMD38 is supported in CQ mode, but only with a Discard op. The quirk (applied to all Sandisk cards) demotes Discard to Erase, which presumably results in undefined behaviour. If I check for the correct option and disable CQ otherwise, the Sandisk card no longer returns bogus data.
Oh, I've realized only now that the CQ on cards 2 and 3 is actually disabled now. The performance without CQ is lower than with CQ. For example card 2 gives 3800 with CQ and only 2000 without CQ as measured by pibenchmarks. Isn't there a way to keep CQ on and instead sanitize the trim call?
Oh, I've realized only now that the CQ on cards 2 and 3 is actually disabled now. The performance without CQ is lower than with CQ. For example card 2 gives 3800 with CQ and only 2000 without CQ as measured by pibenchmarks. Isn't there a way to keep CQ on and instead sanitize the trim call?
You're conflating two things - issuing commands to cards in the wrong state (a software bug), and cards that do not respond correctly to commands in CQ mode even if the state is correct (buggy cards). I can fix the first, but I can't fix the second. So those cards don't get CQ enabled.
Would it be possible to flush CQ after trim for those cards? Or issue a fake read of 64kB after trim? (maybe 128kB to be sure).
I know, these cards have at least one more issue - the one resulting in CQE recovery. But it seems that CQE recovery returns correct data. And it is much less frequent than the read-after-trim issue, which hits every time.
Describe the bug
I ran
apt update && apt upgrade
today. Since then, approximately every other reboot fails (50% chance) with the following error:I was unfortunate to update 3 of my Raspberry Pi 5s, and every single one experiences this issue.
Steps to reproduce the behaviour
apt update && apt upgrade
Device (s)
Raspberry Pi 5
System
Logs
No response
Additional context
No response