koverstreet / bcachefs

Other
633 stars 69 forks source link

Feature: support a single drive redundancy to be resilient against bad sectors #696

Closed feature-engineer closed 2 weeks ago

feature-engineer commented 2 weeks ago

As far as I understand it, bcachefs only supports redundancy in a RAID configuration. It would be useful to have the ECC written on the same disk (when not configured in RAID) to mitigate data corruption due to bad sectors on disk.

YellowOnion commented 2 weeks ago

Hard drives already do this internally, they also automatically reallocate bad sectors. setup a smart monitoring service, you can see when bad sectors are recovered.

feature-engineer commented 2 weeks ago

@YellowOnion Hard drives do this at the bits level for each sector, and if the sector is damaged beyond recovery, it results in a read error.

My suggestion was to do this at the sector level on the filesystem - i.e. in case there's a read error due to the hard drive not being able to recover that sector.

I have recently encountered a hard drive which had unrecoverable bad sectors uniformly distributed all over. There were a few thousands of them, so in terms of memory lost it was just a few MB, but because they were all over the place, and the files were big, many of the files were corrupted beyond recovery.

If the filesystem had an ECC for these files, it could have recovered them.