Open tribbloid opened 5 months ago
No csum repair support, so it will do nothing.
And I tried it with latest v6.9 progs, it reports it correctly as an error, no matter if it's --repair:
[adam@btrfs-vm ~]$ btrfs check --check-data-csum --repair --force /dev/test/scratch1
enabling repair mode
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 64e210b4-34f1-4b47-98cf-52ce991841e2
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
No device size related problem found
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 298848256 csum 0x13fec125 expected csum 0x98757625
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 134512640 bytes used, error(s) found
total csum bytes: 131072
total tree bytes: 294912
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 154164
file data blocks allocated: 134217728
referenced 134217728
[adam@btrfs-vm ~]$ echo $?
1
And non-repair mode:
[adam@btrfs-vm ~]$ btrfs check --check-data-csum /dev/test/scratch1
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 64e210b4-34f1-4b47-98cf-52ce991841e2
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 298848256 csum 0x13fec125 expected csum 0x98757625
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 134512640 bytes used, error(s) found
total csum bytes: 131072
total tree bytes: 294912
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 154139
file data blocks allocated: 134217728
referenced 134217728
[adam@btrfs-vm ~]$ echo $?
1
So I believe this is already fixed.
hmm ... let me try 6.9 later, at this moment is is not shipped in any distro so it may have compound symptom
hmm ... let me try 6.9 later, at this moment is is not shipped in any distro so it may have compound symptom
You can try the statically built btrfs-progs available from github: https://github.com/kdave/btrfs-progs/releases/download/v6.9/btrfs.static
also no repair was attempted despite --repair option is used
There is an underlying documentation / user expectation issue here, as check should never be used as described. Scrub is the appropriate tool for verifying csums and repairing failures at the device level.
Check and scrub have different goals with mutually exclusive assumptions. Check assumes that if a csum mismatch occurs, the data is correct and the csum is wrong, i.e. the csum failure is due to a kernel bug putting the wrong csum on the block, or a DRAM fault corrupting the data before the csum is calculated, or some other error which occurs above the device level.
Scrub assumes the opposite, that if a csum mismatch occurs, the csum is correct, and the data is wrong, i.e. the error occurs at or below the device level.
Scrub will read other mirror copies of the data and repair the bad copy if there's a recoverable good copy, or do no further harm if it is not possible to perform a correct repair. Check will try to incorporate the bad data into the filesystem, which will conceal errors at best, and catastrophically damage the filesystem at worst. In some cases this is desirable, as there are consistency checks within btrfs check that can repair old and well-understood kernel bugs, but most of the time, importing garbage metadata from a device doesn't end well.
Generally, if something goes wrong, the first step is to run scrub, and if that doesn't resolve the issue, escalate to other recovery methods in order of increasing risk of data loss. check --repair
is somewhere in the middle of that list of methods.
There is a potential enhancement here, where check could get an option to strictly reject all blocks that fail the device-level consistency checks (csum failure, tree block address, parent transid, etc), try to do an in-place repair from a mirror, and if repair is not possible, abort its operation to avoid further damage (continuing is not possible until check learns how to reconstruct interior nodes of the metadata tree). That would allow check to be used safely on a filesystem that has had corruption at the device level, because it would have a built-in pre-scrub function filtering out bad data.
Encountered on v6.2 (Debian 12) and v6.7 (Fedora 40)
sample output:
"no error found" is self-contradicting with "checksum verify failed on 58720256 wanted 0x8f087114 found 0xb87d3f03"
also no repair was attempted despite --repair option is used