Open axet opened 5 months ago
Fsck only tries to read (and then remap, if there are I/O errors) those sectors which contain pre-existing file system metadata blocks. We can't just skip those blocks without leaving the file system inconsistent, and if fsck needs to read them, then at some point if you try to read or stat some file, or stat or list some directory, depending on which metadata block is bad, the kernel will run into that problem as well. So if fsck is trying to read some block which is bad, you will have some amount of data loss and file system corruption. The only question is how bad is it going to be? So for fsck to ignore those bad sectors, means leaving the file system in a bad state. Are you really willing to live with that?
How much is your data worth? How much is your time worth? What is the cost of just buying another disk and just replacing one which is that far gone?
Not everybody has a price in this world. Not every person, not every product. Like open-source project? Right?
I'm speaking a about my backup / games drive here. Which is big, and expensive. And I won't replace it because a few bad blocks. I can keep using it until it 100% dead.
My strategy is to extend it's life by only relocating bad blocks on a most important areas of the drive like first and last MB (with gpt/mbr partition information), and leave the rest as it be. That would extend it's life since SMART spare sector area is limited and will be consumed much slower. Good?
For that purpose I choice ext4 and start to run badblocks periodically, to avoid rewriting bad blocks in the middle of the drive, but keep filesystem healthy by avoiding spending those valuable replacable sectors from SMART. Turns out that ext4 does not support such strategy and re-read and re-write those sectors for me every time I ran fsck.
Right now, fsck not just taking a lot of time trying to read bad sectors, with out a positive result, but also trying to write bad sectors forcing SMART to relocate those sectors.
Since you are fsck developer, I assume that is how utility was designed. And having an option for new behavior which could extend HDD life would be grate.
Also I do not think "reading bad blocks will recover some portion of the data/sector" is a correct statement. Most likely it will return sector full of 0xff or zeros after few minutes delay (per sector). And I do not think this is correct strategy.
The problem is that if there is a block that has become bad, and that block was part of ext4's file system metadata, there is very good chance that you will have already lost data. And if fsck tries to read that block, then the kernel might try to read that block too --- and it is fsck's job to get the metadata into a state where the kernel won't trip over a bad or corrupted sector. Ignoring a bad block doesn't mean that we will have avoided a problem. It's like putting your fingers in your ears and saying LALALAL I can't hear the car that's headed straight at me and blaring its horn, so I must be safe.
As far as cost is concerned, a 1TB HDD or SSD is about $50 to $60 USD. That's not a lot of money. And maybe you don't care about the value of your time, but I care about the value of my time. And being a maintainer doesn't mean that you do free feature development for any random user who feels entitled enoguh to demand their own pet feature. That's not how open source works. Open source means that if you want to make a change that will cause your data to be at risk, you are free to do so. And if it causes your data to get scrambled, well, you get to keep both pieces.
Now if someone sends me patches for something is a good idea (as opposed to being a risk for other users' data), then I might integrate that contribution into the upstream open source code base, that's also what open source is about. But as the saying goes, free software is not free as in bear, it's free as in freedom. And this includes the freedom to shoot your own foot. But you can't force someone to shoot off your foot for you, or to give you the bullets to shoot off your own foot; that's not how it works.
Here is a patch for extending HDD life by reducing spare sectors usage (also so-called "my pet feature", I guess because only I care about extending HDD life, saving money and time):
1) adds ignore badblocks checks if run with -k flags (why re-check aleady know badblocks if they already be ignored) 2) adds new option -K (--ignore-badblocks) which ignores read errors and ignore re-writes bad sectors.
https://gitlab.com/axet/homebin/-/blob/debian/dbuild.d/trixie/e2fsprogs/ignore_readrewrite.patch
Hello!
Every time I ran 'fsck -c' or 'fsck -l' fsck trying to re-read and re-write bad sectors. That takes very long time and causing remap bad sectors on a drive by SMART. I prefer to remap bad sectors manually only on most important HDD areas (first and last MB) not entire drive surface.
Can we have option which force fsck to ignore bad sectors?
For example if 'badblocks -i' option used it will ignore those sectors, from man:
EDIT: same report 5y old is here: