Open philiprhoades opened 2 days ago
btrfs check
only verify metadata. If you want to verify data (against its checksum), you need to go btrfs check --check-data-csum
.
This is explicitly shown in the output:
[5/7] checking only csums items (without verifying data)
This situation shows a data csum mismatch, at least the dmesg shows where you can find the inode and the offset:
Nov 21 17:44:03 liph kernel: BTRFS warning (device sdd3): csum failed root 256 ino 148956179 off 0
So it means it's inode number 148956179 in subvolume 256, at the file offset 0.
There are some known situations that can lead to data csum mismatch, the most common one is the incorrect usage of direct IO, which modifies the direct IO memory before the IO is finished, aka a bug in the user space program. I'll be very interesting to see what the file is, and if it's really some data corruption from lower level hardware or whatever, you can remove the file, and re-run scrub to verify if there is any other corruption involved.
@adam900710 ,
This is explicitly shown in the output:
Ah!, yes -thanks.
OK, that all makes sense - it is a bit annoying though - it is a relatively new 8TB Seagate drive . . but I suppose it could be a software thing like you say . .
I decided to get started on the "installing the email server on my workstation" remedy and rsynced everything that needed copying to their approp dirs and I saw this sort of stuff:
rsync: [sender] read errors mapping "/mntd3/home/phr/Maildir/.0_naf_linked_in/cur/1730460943.M909119P257528.prix,S=120368,W=122027:2,": Input/output error (5)
rsync: [sender] read errors mapping "/mntd3/home/phr/Maildir/.0_naf_linked_in/cur/1730460943.M909119P257528.prix,S=120368,W=122027:2,": Input/output error (5)
With just a quick look at those Maildir files, nothing looks Mission Critical - just annoying as I said . .
Hmm . . so you think removing and scrubbing is better than trying rescue or restore?
Thanks for responding!
I believe you can delete involved files, scrub to make sure no more errors, then rsync those deleted files (if needed).
Another thing you mentioned is, you "migrated" from ext4. Do you mean you use btrfs-convert
instead of building a new btrfs and copy thing to it?
If so, it may be a recent exposed bug related to btrfs-convert, which causes incorrect handling of unwrtitten ext4 extents. And in that case, I'd recommend to start from scratch instead.
And since what is the mail server? If it's open-source I'd like to do a quick check to rule out direct IO problems.
I believe you can delete involved files, scrub to make sure no more errors, then rsync those deleted files (if needed).
Do you mean move the files first? - how can I rsync them after deleting them and scrubbing?
Another thing you mentioned is, you "migrated" from ext4. Do you mean you use btrfs-convert instead of building a new btrfs and copy thing to it?
Sorry - I created a new btrfs from scratch and rsynced from backup so the fs should have been clean.
And since what is the mail server? If it's open-source I'd like to do a quick check to rule out direct IO problems.
IndiMail (a fork of Qmail) - I am talking to the author later tonight - I have already asked him that Q on Telegram - but I will check when we have a voice chat and report back here.
Thanks again!
Do you mean move the files first? - how can I rsync them after deleting them and scrubbing?
I mean delete the involved files (using the rootid and inode number), then rsync the whole directory. Rsync will detect files changed (including the deleted one), and copy the good one from remote.
And I did a quick search through the indimail repo group, no O_DIRECT
usage (it should be the case for any C program), but didn't notice any hit.
I mean delete the involved files (using the rootid and inode number), then rsync the whole directory. Rsync will detect files changed (including the deleted one), and copy the good one from remote.
Ah - wrong way round - the files I want to copy are on the problem (server) drive NOT the destination (workstation) drive . . I am worried that any NEW emails (that haven't previously been backed up to the workstation) won't be recoverable . .
Good about Indimail! - I will not make any changes to the problem drive and after I get going again I will see what rescue / restore can do for it . .
Thanks.
People,
I have been using Linux for decades but have only started switching my machines over to btrfs (from ext4) recently.
Yesterday I came across my first btrfs problem - I was looking at mail files in my "likely spam" folder on my Qmail (Indimail) server under ~/Maildir. I wanted to check out the mail Subjects and did something like:
I then did:
and so thought the problem might be with SATA controller on the server but then when I started doing other things, I started seeing these errors in /var/messages:
So now I am confused - the check says the FS is OK, but there is obviously a problem - because this is my first exposure to a btrfs problem, I am not if I should try and sort it out on the drive or move the mail serving stuff over to my workstation temporarily and look at the drive problem later . .
If someone suggest the most sensible thing to do that would be great - otherwise I might go the workstation route . .
Either way, it looks like I have a btrfs problem that needs fixing - which will be an interesting exercise!
Thanks!
Phil.
PS I am using my personal GMail account because my own mail server is down!