kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
527 stars 239 forks source link

HDD read write failure on single array #777

Open lrcressy opened 2 months ago

lrcressy commented 2 months ago

When a disk fails it would be nice to be able to force a removal of a device on a single array when the copy fails because of the drive failure. I just bought a WD 20 T drive which failed after 4 days usage. Because of hardware failure I had to recreate the entire filesystem.

Forza-tng commented 2 months ago

@lrcressy it's a very unfortunate thing that happened to you. When it comes to the single profile, there are several things that makes what you want very difficult. If you were using single profile for metadata, then the likelihood to survive a disk failure is near 0, because some metadata will undoubtedly be on the broken device. If you have raid1 profile, the metadata will survive one failed disk, and any data on that disk will not be readable. You could leave the filesystem in this state until a replacement is available. Any files (could be a lot, or most) with missing data will have to be recreated from backups, but the filesystem should be recoverable.

lrcressy commented 2 months ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

On Wed, 2024-04-10 at 09:27 -0700, Forza wrote:

@lrcressy it's a very unfortunate thing that happened to you. When it comes to the single profile, there are several things that makes what you want very difficult. If you were using single profile for metadata, then the likelihood to survive a disk failure is near 0, because some metadata will undoubtedly be on the broken device. If you have raid1 profile, the metadata will survive one failed disk, and any data on that disk will not be readable. You could leave the filesystem in this state until a replacement is available. Any files (could be a lot, or most) with missing data will have to be recreated from backups, but the filesystem should be recoverable. — My usage currently looks like minus the returned 20T drive: $ sudo btrfs filesystem usage /home/video [sudo] password for leroy: Overall: Device size: 50.93TiB Device allocated: 13.46TiB Device unallocated: 37.47TiB Device missing: 0.00B Device slack: 0.00B Used: 13.41TiB Free (estimated): 37.53TiB (min: 18.79TiB) Free (statfs, df): 37.53TiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no

Data,single: Size:13.44TiB, Used:13.38TiB (99.57%) /dev/mapper/video-1 8.51TiB /dev/mapper/video-2 4.87TiB /dev/mapper/video-3 51.00GiB

Metadata,RAID1: Size:15.00GiB, Used:14.41GiB (96.08%) /dev/mapper/video-1 15.00GiB /dev/mapper/video-2 15.00GiB

System,RAID1: Size:8.00MiB, Used:1.42MiB (17.77%) /dev/mapper/video-1 8.00MiB /dev/mapper/video-2 8.00MiB

Unallocated: /dev/mapper/video-1 9.66TiB /dev/mapper/video-2 9.66TiB /dev/mapper/video-3 18.14TiB

I'm a paranoid old man and use cryptsetup to encrypt my /home partition and other data partitions before adding a file system. /dev/mapper/video-4 was the first Western Digital drive that ever failed on me. In the past many Seagate drives have failed thus I never buy them any more.

When I attempted to backup my directory /dvd/ to an external hard drive using rsync I received read errors like the following:

2024/04/07 13:43:10 [216956] rsync: [sender] read errors mapping "/home/video/archived videos/dvd/The Shawshank Redemption/THE_SHAWSHANK_REDEMPTION.iso": Input/output error (5)

2024/04/07 13:43:56 [216956] rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1336) [sender=3.2.7]

I attempted to remove the device with the read errors: btrfs device remove /dev/mapper/video-4 /home/video/ which gave me a bunch of read errors when it attempted to copy the device.

I attempted to remove all of the files with read errors and shrink the device size which also ended with failure.

What I would like to see is an option to force a removal of a failing device with bad blocks or read write errors turning off the copy process to other devices on the array.

I mount my btrfs file systems with the following options: autodefrag,rw,relatime,x-systemd.mount-timeout=5min


Rev. LeRoy D. Cressy @.***

/_/\ ( o.o )

^ <

GnuPG Fingerprint: D234 65D6 2822 211B F82F A6CA B332 4CC8 ACBF D200

Jesus saith unto him, I am the way, the truth, and the life: no man cometh unto the Father, but by me. (John 14:6)

You can find me on the following:

Telegram: (My Main Channel) https://t.me/RevLeRoy (Send Me A Message) https://t.me/LeRoy_Cressy (My Videos) https://t.me/rev_cressy_videos

Video Channels: ugetube.com: @. bitchute.com: https://www.bitchute.com/channel/the-rev/ brighteon.com: https://www.brighteon.com/channels/therev rumble.com: https://rumble.com/user/lrcressy odysee.com: @.

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQS0AKJCtCv8k5X07DIuVTWo1KllbwUCZhfi+wAKCRAuVTWo1Kll bxHzAQD7YfLf5Mi5Nh5wBoCr6+ux1FG6HO4eHMH9hXRfAp1xEQD+MGougJd0rqLL 6gQoQ0rGedBm1dvkQJRg5LoVtaOcTAY= =NWG4 -----END PGP SIGNATURE-----