kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
557 stars 242 forks source link

Feature Requst: mini-scrub mirrored non-checksummed data (i.e. NoCOW) #134

Open jamespharvey20 opened 6 years ago

jamespharvey20 commented 6 years ago

On a 3 device btrfs RAID 1, I have 2 files that are marked NoCOW, but are still lzo compressed, that when read crash the system. They're journald files, so nothing that should crash the system. This has been discussed on the mailing list, with title: '"decompress failed" in 1-2 files always causes kernel oops, check/scrub pass'. Chris Murphy remembered in the archives a discussion that compression can be forced on NoCOW under certain circumstances. Bug reported here: https://bugzilla.kernel.org/show_bug.cgi?id=199707

Being marked NoCOW, they don't have checksums, so btfs scrub doesn't look at them.

In my situation, one of the mirrored copies is valid, and the other is invalid.

Even once the kernel crash is fixed, it would be a really important data integrity feature for btrfs-progs (I'm thinking a new feature on scrub) that when running on a mirrored volume to look at any files without a checksum, and compare the mirrored copies. If they're different, obviously nothing can be automatically corrected without a checksum to verify. But, it's important to let the user know there's a problem. They can check which version appears valid, restore that file from backups, just know something is wrong, etc.

adam900710 commented 6 years ago

Pull request https://github.com/kdave/btrfs-progs/pull/135 should address the problem, although not the direction of fixing/scrubbing them.

That pull request will report such extents as errors, and for kernel btrfs fix, at least we will prevent such problems by never compress any extent if NODATASUM is set.

jamespharvey20 commented 6 years ago

I agree with #135.

Wanted to clarify if you think #135 makes the feature request unnecessary.

I still think there's good reason for it. Although the kernel patches and #135 will prevent NODATASUM/NODATACOW data from being compressed, there can of course still be uncompressed data without checksums that has mirrored copies that could have one copy get corrupted. If the good copy is read, user never gets alerted their data is no longer actually mirrored, unless the good copy gets corrupted too. Then (or, if the bad copy is read initially before the good copy) user silently gets bad data.

If the mirrored copies are compared, user will know something's wrong and be able to deal with it, granted not automatically, when it's still fixable.

Forza-tng commented 1 year ago

This is similar to a request I made while back: https://github.com/kdave/btrfs-progs/issues/482