tytso / e2fsprogs

Ext2/3/4 file system utilities
http://ext4.wiki.kernel.org
374 stars 221 forks source link

Badblocks fails on really large disks, can take e2fsck down as well #111

Open hamishmb opened 2 years ago

hamishmb commented 2 years ago

Hi there,

I'm running Linux Mint 20.3 with e2fsprogs 1.45.5-2ubuntu1. I realise this isn't the latest version, so hopefully I'm reporting a bug you already found and fixed. I couldn't see a way to check easily, but couldn't find anything that looked relevant with a quick search of the commit history since early 2020 (Mint 20.3 is based on Ubuntu 20.04 LTS).

Since I got an 8TB USB backup hard drive, I found that when I run the badblocks program on it periodically, I have to specify a bigger blocksize (-b 8192) in order for it to work.

Otherwise I receive the error:

badblocks: Value too large for defined data type invalid end block (7814023168): must be 32-bit value

In and of itself, this isn't a huge issue, but when I run e2fsck on it to check the EXT4 file system, if I use the bad sector check option to launch badblocks, e2fsck just hangs forever at that point, with the drive making a brrrrrrrrrr sound, as if it is trying to launch badblocks over and over again without checking the exit value.

So I suggest that e2fsck should check and handle this return value (I can provide the specific value by running it again if you need me to), and badblocks should use 64-bit values for the number of blocks in a disk.

hamishmb commented 2 years ago

Perhaps also worth mentioning is that it probably shouldn't be assumed that disks have 512-byte sectors any more, a lot of them use 4k sectors now.

tytso commented 2 years ago

What I will probably do in the next major release is to deprecate the e2fsck -c and mke2fs -c options. The reason of this is something I explained a year or so ago:

I will say that for modern disks, the usefulness of badblocks has decreased significantly over time. That's because for modern-sized disks, it can often take more than 24 hours to do a full read on the entire disk surface --- and the factory testing done by HDD manufacturers is far more comprehensive.

In addition, SMART (see the smartctl package) is a much more reliable and efficient way of judging disk health.

The badblocks program was written over two decades ago, before the days of SATA, and even IDE disks, when disk controlls and HDD's were far more primitive. These days, modern HDD and SSD will do their own bad block redirection from a built-in bad block sparing pool, and the usefulness of using badblocks has been significantly decreased.

https://www.spinics.net/lists/linux-ext4/msg76847.html

If someone wants to send patches to make badblocks work better on large disks, including automatically reading the physical block size and using it to optimize how it works, that's great. But it's not high priority for me.

hamishmb commented 2 years ago

Okay, that seems fair enough.

At any rate, it'd be good to keep badblocks around in its current form even if it doesn't change. I have a use case for it by running the read-write test on any HDDs I have with important data once a year, after some of them demagnetised last year, it seems.

fennectech commented 2 months ago

What I will probably do in the next major release is to deprecate the e2fsck -c and mke2fs -c options. The reason of this is something I explained a year or so ago:

I will say that for modern disks, the usefulness of badblocks has decreased significantly over time. That's because for modern-sized disks, it can often take more than 24 hours to do a full read on the entire disk surface --- and the factory testing done by HDD manufacturers is far more comprehensive.

In addition, SMART (see the smartctl package) is a much more reliable and efficient way of judging disk health.

The badblocks program was written over two decades ago, before the days of SATA, and even IDE disks, when disk controlls and HDD's were far more primitive. These days, modern HDD and SSD will do their own bad block redirection from a built-in bad block sparing pool, and the usefulness of using badblocks has been significantly decreased.

https://www.spinics.net/lists/linux-ext4/msg76847.html

If someone wants to send patches to make badblocks work better on large disks, including automatically reading the physical block size and using it to optimize how it works, that's great. But it's not high priority for me.

Smart can miss a lot of things and smart will only notice an issue if the problem area is actually read. Using badblocks -nsv to refresh the entire surface of a disk is a very effective way to get SMART to actually notice and take action on problematic areas of the disk. The only way to truely validate a disk is to read every byte of that disk and badblocks is a very nice way to do this. Ive also had badblocks find issues on SSDs with a clean SMART report. SMART did not even fix this issue after a badblocks pass. So saying SMART Is the most reliable way to find defective disks is simply a non starter. The only way to truly verify a drive is to read back the data stored on the drive and validate it against what it should be. Badblocks non destructive read write test is a perfect method to do this on drives with a dumb filesystem like fat32.