dm-crypt corruption issues (?)

flokli commented 1 month ago

In the last few days I've been running into a bunch of btrfs corruption issues on my Macbook M2 Air. I initially suspected a single fluke, but it got worse.

Yesterday I entirely re-created the filesystem (luks with --allow-discards), then mkfs.btrfs with default params, and again got btrfs errors.

It seems I can rule out the internal SSD internal, as the same issues also happens on a (somewhat reliable and fast) external SSD (formatted with LUKS and btrfs).

Opening filesystem to check...
Checking filesystem on /dev/mapper/usb
UUID: a4d7d051-44ee-4512-bcfd-3b634526b02a
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 23302864896 csum 0xc296d77c expected csum 0xe5e91fb3
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 42043219968 bytes used, error(s) found
total csum bytes: 40380008
total tree bytes: 685703168
total fs tree bytes: 570392576
total extent tree bytes: 63324160
btree space waste bytes: 119804959
file data blocks allocated: 41357516800
 referenced 41357484032

This was after copying my /nix/store from the host to /mnt, and unmounting.

dmesg of the host:

    6.192083] BTRFS: device label root devid 1 transid 41 /dev/disk/by-label/root scanned by mount (530)
[    6.192241] BTRFS info (device dm-0): first mount of filesystem 5eaac3f0-833c-4f0f-b6f5-df3eb94e4327
[    6.192248] BTRFS info (device dm-0): using crc32c (crc32c-generic) checksum algorithm
[    6.192252] BTRFS info (device dm-0): forcing free space tree for sector size 4096 with page size 16384
[    6.192254] BTRFS info (device dm-0): using free-space-tree
[    6.192254] BTRFS warning (device dm-0): read-write for sector size 4096 with page size 16384 is experimental
[    6.200521] BTRFS info (device dm-0): checking UUID tree
[    6.740028] systemd-journald[807]: Creating journal file /var/log/journal/bbe02739e577495c999bfebef448138d/system.journal on a btrfs file system, and copy-on-write is enabled. This is likely to slow down journal access substantially, please consider turning off the copy-on-write file attribute on the journal directory, using chattr +C.
[    6.817857] BTRFS info: devid 1 device path /dev/disk/by-label/root changed to /dev/dm-0 scanned by (udev-worker) (936)
[ 7896.890101] BTRFS: device fsid a4d7d051-44ee-4512-bcfd-3b634526b02a devid 1 transid 6 /dev/mapper/usb scanned by mount (17224)
[ 7896.890774] BTRFS info (device dm-2): first mount of filesystem a4d7d051-44ee-4512-bcfd-3b634526b02a
[ 7896.890802] BTRFS info (device dm-2): using crc32c (crc32c-generic) checksum algorithm
[ 7896.890811] BTRFS info (device dm-2): forcing free space tree for sector size 4096 with page size 16384
[ 7896.890816] BTRFS info (device dm-2): using free-space-tree
[ 7896.890819] BTRFS warning (device dm-2): read-write for sector size 4096 with page size 16384 is experimental
[ 7896.892854] BTRFS info (device dm-2): checking UUID tree
[ 7896.893323] BTRFS info (device dm-2): last unmount of filesystem a4d7d051-44ee-4512-bcfd-3b634526b02a
[ 7902.339423] BTRFS: device fsid a4d7d051-44ee-4512-bcfd-3b634526b02a devid 1 transid 8 /dev/mapper/usb scanned by mount (17294)
[ 7902.340267] BTRFS info (device dm-2): first mount of filesystem a4d7d051-44ee-4512-bcfd-3b634526b02a
[ 7902.340294] BTRFS info (device dm-2): using crc32c (crc32c-generic) checksum algorithm
[ 7902.340303] BTRFS info (device dm-2): forcing free space tree for sector size 4096 with page size 16384
[ 7902.340308] BTRFS info (device dm-2): using free-space-tree
[ 7902.340312] BTRFS warning (device dm-2): read-write for sector size 4096 with page size 16384 is experimental
[ 8089.623727] BTRFS warning (device dm-0): csum failed root 5 ino 978709 off 1253376 csum 0x81dd87df expected csum 0xe77556aa mirror 1
[ 8089.623738] BTRFS error (device dm-0): bdev /dev/dm-0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 8320.442776] BTRFS warning (device dm-0): csum failed root 5 ino 807754 off 49152 csum 0x579bee4a expected csum 0x8a36f543 mirror 1
[ 8320.442794] BTRFS error (device dm-0): bdev /dev/dm-0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 8320.450734] BTRFS warning (device dm-0): csum failed root 5 ino 807754 off 94208 csum 0x857c0a3b expected csum 0xc3bf7a9c mirror 1
[ 8320.450739] BTRFS error (device dm-0): bdev /dev/dm-0 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[ 8320.462502] BTRFS warning (device dm-0): csum failed root 5 ino 807754 off 49152 csum 0x579bee4a expected csum 0x8a36f543 mirror 1
[ 8320.462507] BTRFS error (device dm-0): bdev /dev/dm-0 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
[ 8320.470859] BTRFS warning (device dm-0): csum failed root 5 ino 807754 off 49152 csum 0x579bee4a expected csum 0x8a36f543 mirror 1
[ 8320.470866] BTRFS error (device dm-0): bdev /dev/dm-0 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
[ 8336.179155] BTRFS warning (device dm-0): checksum verify failed on logical 1039695872 mirror 1 wanted 0x9e547725 found 0x87cd70c6 level 0
[ 8336.179673] BTRFS info (device dm-0): read error corrected: ino 0 off 1039695872 (dev /dev/dm-0 sector 2047040)
[ 8651.859183] BTRFS info (device dm-2): last unmount of filesystem a4d7d051-44ee-4512-bcfd-3b634526b02a

tpwrules commented 1 month ago

Should we roll back the kernel? If you have a relatively easy and safe way to replicate, can you try a few past kernel versions?

flokli commented 1 month ago

I'm currently trying ZFS with its own crypto layer, so if it's really dm-crypt (only) I shouldn't be affected anymore.

If that's stable, and everything is set up again, I can do the smoketest with the external drive on various kernel versions and see if there's a pattern.

vs49688 commented 1 month ago

Might be coincidental, but I hit some bad ext4 corruption yesterday on my M1, also using dm-crypt.

It was rebuilding the kernel+mesa, and the compile started failing with gcc complaining one of the kernel .c files was filled with binary content.

I noticed this in dmesg:

May 16 02:08:23 ZAIR kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #11943410: comm nix-daemon: iget: bad extra_isize 762 (inode size 256)

Rebooted to this, and ran a repair. I don't have the full fsck log, but it was large.

May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:46 UTC 2024] Passphrase for /dev/disk/by-uuid/2186c706-f18d-4be1-b1e6-cdfb0260843d:
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] Verifying passphrase for /dev/disk/by-uuid/2186c706-f18d-4be1-b1e6-cdfb0260843d... - success
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] starting device mapper and LVM...
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] 2 logical volume(s) in volume group "vg" now active
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] checking /dev/disk/by-uuid/07e2eada-bf28-4f4e-b0f0-1fbc05953b2a...
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] fsck (busybox 1.36.1)
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] [fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-uuid/07e2eada-bf28-4f4e-b0f0-1fbc05953b2a
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:48 UTC 2024] root contains a file system with errors, check forced.
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] root: Inode 9608098 seems to contain garbage.
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] root: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] (i.e., without -a or -p options)
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] /dev/disk/by-uuid/07e2eada-bf28-4f4e-b0f0-1fbc05953b2a has unrepaired errors, please fix them manually.
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] An error occurred in stage 1 of the boot process, which must mount the
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] root filesystem on `/mnt-root' and then start stage 2.  Press one
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] of the following keys:
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] i) to launch an interactive shell
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] f) to start an interactive shell having pid 1 (needed if you want to
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] start stage 2's init manually)
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] r) to reboot immediately
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:26:51 UTC 2024] *) to ignore the error and continue
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:27:10 UTC 2024] Starting interactive shell...
May 16 02:28:42 ZAIR stage-1-init: [Wed May 15 16:28:41 UTC 2024] mounting /dev/disk/by-uuid/07e2eada-bf28-4f4e-b0f0-1fbc05953b2a on /...

fx-chun commented 1 month ago

I've had some recent corruption issues as well rendering my primary partition unbootable; I don't have a log to provide, but I'm using an ext4 partition on LUKS. I'm not sure what it could be exactly.

flokli commented 1 month ago

I'm currently trying ZFS with its own crypto layer, so if it's really dm-crypt (only) I shouldn't be affected anymore.

If that's stable, and everything is set up again, I can do the smoketest with the external drive on various kernel versions and see if there's a pattern.

I tried reproducing the issue from there, by copying my /nix/store to the external drive with a btrfs inside a luks volume.

I could not immediately reproduce it anymore, though that's a kernel with much more options enabled, essentially a distro kernel built with the asahi kernel sources (https://github.com/yu-re-ka/nixos-m1/tree/minimize-patches).

devusb commented 1 month ago

I could reproduce this using ext4 + LUKS and btrfs + LUKS -- I didn't try for long, but it seemed like btrfs without LUKS was not exhibiting this issue (as observed by multiple scrubs without checksum errors).

Wonder if this also happens on Fedora -- spent a bunch of time trying to find any mention of it but no luck -- I had managed to convince myself this was a hardware issue on my side until now :)

mixi commented 1 month ago

I can reproduce it with a vanilla linux v6.8.9 on a M1 Pro Macbook (j316). That opens up the possibility to bisect it.

The reproducer I am using is ~~tio's~~ fio's examples/basic-verify.fio on a freshly created dm-crypt volume, which seems to trigger the bug reliably.

flokli commented 1 month ago

Can you post a bit more details on how to reproduce? I don't know tio and a quick search didn't turn up anything helpful in particular.

mixi commented 1 month ago

Sorry, that was because I typoed the name. The tool is called fio: axboe/fio.

It worked for me both with filename=/dev/mapper/... at the end of examples/basic-verify.tio (and adding loops=10 to get to roughly 10GB to be reproducible) for testing on the block device and with replacing that line with size=10G for testing a mounted filesystem.

In the meantime my bisect also pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") as the commit responsible.

mixi commented 1 month ago

I double checked with the proper asahi kernel. It is fixed for me with the following commits reverted:

aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") (bisect with the new reproducer), and~~ (~~edit: the commit does not need to be reverted~~ edit 2: the commit needs to be reverted)
2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") (for context reasons)

knurd commented 1 month ago

Do you want me to forward this report upstream? If yes, two short questions:

Did you do the bisection with a vanilla kernel? Is vanilla 6.9 still showing the same problem? And does a revert help there, too (I assume all of that is the case, but sometimes it's better to be sure)

[side note: I'm the Linux kernel's regression tracker; somebody pointed me here; normally I do not comment on downstream bug trackers, but I make an exception due to the data corruption aspect]

knurd commented 1 month ago

ahh, I see, somebody reported it upstream already: https://lore.kernel.org/all/D1B7GPIR9K1E.5JFV37G0YTIF@shadowice.org/ great, thx!

mixi commented 1 month ago

That was me reporting it, but thanks for the offer.

tpwrules commented 1 month ago

Thanks all for the debugging efforts. I plan to do a NixOS Apple Silicon release with a revert patch within 24-48 hours, assuming the Asahi Linux kernel branch is not updated.

jannau commented 1 month ago

I double checked with the proper asahi kernel. It is fixed for me with the following commits reverted:
* aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") (for the context), and

* 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") (as found by `git bisect`)

Hej @mixi, both commits need to be reverted on top of asahi-6.8.9-5 / v6.8? I'm a little confused since the linux-arm-kernel mail only mentions 2632e2521769 which reverts and builds cleanly.

mixi commented 1 month ago

You are right to be confused. Reverting 2632e2521769 alone is enough, and that is also the commit bisect pointed me to yesterday.

Apparently I reverted one commit too many by accident and guessed I did it for context reasons when writing the comment afterwards.

jannau commented 1 month ago

@tpwrules asahi-6.8.9-6 containing only the revert pushed to AsahiLinux/linux

tpwrules commented 1 month ago

Latest release contains the revert. @flokli please close the issue if you are satisfied with that fix.

mixi commented 1 month ago

Bad news: aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") also needs to be reverted. See https://lore.kernel.org/all/Zkw9kK0sXIgfqd01@shadowice/ for details, and a new reproducer that found the commit (the old one reproducibly sees the commit as good).

@jannau:

Apparently I reverted one commit too many by accident and guessed I did it for context reasons when writing the comment afterwards.

Correction: Apparently I reverted the right commit for the wrong reasons back then.

ardbiesheuvel commented 1 month ago

Please try this fix, and report on the thread whether or not it works for you: https://lore.kernel.org/all/20240522091335.335346-2-ardb+git@google.com

flokli commented 1 month ago

Just to make sure, is this a fix to be applied on top of any reverts (and if so, which), or an attempt to fix without reverting anything else?

ardbiesheuvel commented 1 month ago

The latter.

flokli commented 1 month ago

With asahi-6.8.9-7 (essentially reverting the other revert(s) and applying that patch) I don't seem to be running into these issue anymore, a lot of other folks on the ML thread also reported the same, and it already got applied to arm64 (for-next/core).

I guess what's left here is bumping linux-asahi in here again, then this can be closed.

flokli commented 1 month ago

PR up at https://github.com/tpwrules/nixos-apple-silicon/pull/202

larstiq commented 1 month ago

Thanks @flokli ! I used to reliably get Firefox to crash by running nix-store --verify --check-contents in the background. With #202 that's no longer happening.

flokli commented 1 month ago

https://github.com/tpwrules/nixos-apple-silicon/pull/202 has been merged (bumping the kernel to asahi-6.8.9-7, including a cherrypick), and a new release of nixos-apple-silicon been created, so we can close the issue here.

On the upstream kernel side, I however noticed the fix only landed in the master branch so far - meaning other aarch64 machines running the mainline kernel might still run into this corruption.

@knurd is there anything else left to be done so this gets cherrypicked to linux-6.9.y, so it'll land in v6.9.2?

knurd commented 1 month ago

@knurd is there anything else left to be done so this gets cherrypicked to linux-6.9.y, so it'll land in v6.9.2?

That's likely too late, as 6.9.2 is in its -rc phase already – and usually Greg does not add any patches at that point aiui. You could ask though. But it likely should go into 6.9.3 dues to the "CC: stable..." tag in the commit.

flokli commented 1 month ago

(trying here, as I don't have that ML subscribed): Hey @gregkh, any chance "arm64/fpsimd: Avoid erroneous elide of user state reload" could still end up in 6.9.2, due to its data corruption nature?

gregkh commented 1 month ago

Please send stable requests to stable@vger.kernel.org, we can't take stuff from random github repos for obvious reasons.

flokli commented 1 month ago

The commit I linked had a Cc: stable in the message. That's sufficient?

knurd commented 1 month ago

The commit I linked had a Cc: stable in the message. That's sufficient?

Up to Greg, but I'd say it's in the everyone's best interest if you write a quick mail to the list (like with most Linux kernel lists, you don't have to be subscribed!) with Greg CCed (side note: you might ask for the patch to be included in 6.8.y, too) – that among others is also important for the paper trail in case the question "who asked for this to be included" comes up later.

flokli commented 1 month ago

@knurd Sent out an email to stable@, both you and greg are in CC.

tpwrules / nixos-apple-silicon

dm-crypt corruption issues (?) #200