Open tribbloid opened 2 weeks ago
Kernel version please?
I tried 6.10-rc and it correctly detects and fixed it.
[adam@btrfs-vm ~]$ sudo btrfs scrub start -fB /mnt/btrfs/
Starting scrub on devid 1
scrub done for b069e93c-fa69-4b46-ac41-27025aafe0eb
Scrub started: Thu Jun 13 13:13:31 2024
Status: finished
Duration: 0:00:00
Total to scrub: 128.34MiB
Rate: 128.34MiB/s
Error summary: verify=1
Corrected: 1
Uncorrectable: 0
Unverified: 0
WARNING: errors detected during scrubbing, 1 corrected
[adam@btrfs-vm ~]$ sudo btrfs scrub start -fB /mnt/btrfs/
Starting scrub on devid 1
scrub done for b069e93c-fa69-4b46-ac41-27025aafe0eb
Scrub started: Thu Jun 13 13:13:34 2024
Status: finished
Duration: 0:00:00
Total to scrub: 128.34MiB
Rate: 128.34MiB/s
Error summary: no errors found
fedora 40 should be using 6.8~something, let me double check
found it: Kernel: 6.8.5-301.fc40.x86_64
Also found a new recurring problem:
liveuser@localhost-live:~$ sudo btrfs check --repair /dev/disk/by-label/Home
enabling repair mode
WARNING:
Do not use --repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no
fsck can successfully repair all types of filesystem corruption. E.g.
some software or hardware bugs can fatally damage a volume.
The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/disk/by-label/Home
UUID: a5c81116-78a5-4edb-b57c-b08e90e1391b
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
checksum verify failed on 58720256 wanted 0x8b416f75 found 0xf31ed09a
No device size related problem found
[3/7] checking free space cache
cache and super generation don't match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 421089386496 bytes used, no error found
total csum bytes: 19137532
total tree bytes: 2635005952
total fs tree bytes: 2351742976
total extent tree bytes: 256081920
btree space waste bytes: 576841045
file data blocks allocated: 486477520896
referenced 426042204160
liveuser@localhost-live:~$ sudo btrfs check --repair /dev/disk/by-label/Home
enabling repair mode
WARNING:
Do not use --repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no
fsck can successfully repair all types of filesystem corruption. E.g.
some software or hardware bugs can fatally damage a volume.
The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/disk/by-label/Home
UUID: a5c81116-78a5-4edb-b57c-b08e90e1391b
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
checksum verify failed on 58720256 wanted 0x3caae502 found 0x99f99bda
No device size related problem found
[3/7] checking free space cache
cache and super generation don't match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 421089386496 bytes used, no error found
total csum bytes: 19137532
total tree bytes: 2635005952
total fs tree bytes: 2351742976
total extent tree bytes: 256081920
btree space waste bytes: 576841045
file data blocks allocated: 486477520896
referenced 426042204160
"cache and super generation don't match, space cache will be invalidated" is totally useless
"space cache will be invalidated" is typically done by the kernel during the next mount. check --repair
does not have to do anything special in that case (other than to check the metadata of the storage where the cache is located, which is done like any other nodatacow
file in later stages). There's no need to check the cache contents since the kernel will wipe them out on the next mount anyway.
The message wording could be clarified.
Your original report is about scrub not fixing the corruption, but why involving btrfs-check?
Anyway btrfs-progs won't repair csum errors.
Just in case, mind to run a memtest? Something weird is happening.
@adam900710 ah sorry it should be in another issue at best. Now @Zygo has explained it, I need to verify it after a reboot
Nope, kernel won't really address it at mount.
The cache can only be rebuild if some write operation is done to the offending block group.
It's more recommended just wipe the cache, and go v2 cache which is safer and faster (that's why it's the default mkfs option now).
I'm more interested in why the csum mismatch happened for the tree block and why scrub doesn't repair it.
For 6.8.x, there may be the bug of kernel, but since 6.8.x is already EOL 3 weeks ago, I strongly recommend to go 6.9.5 or newer, which fixes the kernel bug that can cause some race.
sample output:
So the 4 errors was listed as "corrected" but running the same command again yield the same 4 errors.
Using dmesg indicates that these error are on the same cluster:
it appears that btrfs scrub doesn't do anything in this case