Open tribbloid opened 1 year ago
--init-csum-tree
is for the data csum tree. It does not affect metadata pages in any way.
btrfs check
always verifies metadata page csums, but it also always ignores metadata page csum failures. That's not what is stopping the check in this case.
checksum verify failed on 24708644864 wanted 0x00000000 found 0xb6bde3e4 bad tree block 24708644864, bytenr mismatch, want=24708644864, have=0
These two lines indicate something overwrote parts of the metadata tree with zeros. Depending on how extensive the damage is, recovery might not be possible.
I see, if I could rephrase:
checksum verify failed
is a warningbad tree block
is an errorwant=24708644864
is computed from current metadata page, have=0
is the old CSum being overwrittenMaybe this should be part of a bigger story to make logging info more specific. Let me upgrade to a later version (shipped with Debian 12.1) and try again.
want= is computed from current metadata page, have=0 is the old CSum being overwritten
It is a different error, unrelated to csums. There are multiple checks on each metadata page:
Of these, only a csum failure can be ignored, provided that the surviving metadata can be corrected some other way.
The other checks detect that the metadata page was never written to the device, or was written and then subsequently lost, or the metadata page is from some other filesystem that was since overwritten by mkfs
. In all of these cases, the metadata page cannot be used.
want=24708644864
is the logical address of the expected metadata page, have=0
is the data from the header written on the page. bad tree block
indicates the expected page was never written at that location (if the page was previously written at that location, it would be a parent transid verify failed
error instead), or was overwritten by something else writing to the filesystem.
have=0
and the 0x00000000
in the checksum verify
line together indicate the metadata page is likely entirely overwritten with zeros. In that case there is no metadata to recover in the page. If the page is critical to the filesystem (like an important root node) then check
cannot recover.
Typically this arises due to a device firmware bug, where a device reported the write was successfully completed, but the device did not in fact complete the write, or a failure occurred in the device some time later and the data was lost.
I see, OK looks like it should be closed and merged into the ticket for more explicit logging.
For the record, there is no firmware bug, but the btrfs partition was shared by Windows (through https://www.bing.com/search?pglt=2083&q=win+btrfs&cvid=688f81ca210d48198534e058738262d4&aqs=edge..69i57j69i64.1673j0j1&FORM=ANAB01&PC=U531) and Linux. On Windows, the tree balancing was set to be performed weekly.
This is a bad idea, Windows driver is less mature than Linux, the balancing is now disabled
Full log:
As its name suggests,
--init-csum-tree
is used to rebuild checksum tree from raw data without any verification (which will be discarded and rebuilt), instead, it complained about checksum verify failed, which makes this option close to being useless