kdave / btrfsmaintenance

Scripts for btrfs maintenance tasks like periodic scrub, balance, trim or defrag on selected mountpoints or directories.
GNU General Public License v2.0
900 stars 79 forks source link

Disable scrub by default? #73

Open mikhailnov opened 5 years ago

mikhailnov commented 5 years ago

I would suggest to disable scrub by default. It loads CPU really severely, causes very high Load Average and is not really needed when there is no RAID, I think.

awerlang commented 4 years ago

For single device setups: I find scrub is still helpful for detecting bitrot as soon as possible, then restore from a backup.

mikhailnov commented 4 years ago

How will scrub run in background help to detects problems?

awerlang commented 4 years ago

I was referring how it is helpful. It is not needed in a sense that no other linux native FS provides that. But it's a feature that has its worth. I'm not on board on running these routines in background though. Here scrub runs pretty fast so far, but in your case I'd look to run it at an appropriate moment manually.

kdave commented 4 years ago

With the default it's hard to please everybody. The load caused by scrub has appeared after the CFQ (io scheduler) has been replaced by mq-deadline that does not implement ionice priorities anymore. The BFQ sheduler does, so this can be "fixed" by swtiching to it. Or disable running scrub in the sysconfig and run it manually. On newer kernels it might be possible to confine scrub in a cgroup with all its bells and whistles to limit the resources, this is being evaluated.

barak commented 4 years ago

At least on a setup with duplication, not running scrub seems like a bad idea. I've had it fix many single-bit cosmic-ray errors.

On a setup without duplication, yeah that's more complex. It doesn't really do any good to do a scrub and then just throw away any error messages. What's really needed is a way to get the information to a human being in a position to make use of the information. This has two components. (1) Getting the fact that an unrecoverable storage corruption event has occurred. This seems straightforward, although I must admit that it doesn't seem properly available. It does seem like something that there should be an official unified API for which desktop systems etc can then queue and present appropriately. But the killer is (2) the information should be presented in a fashion that allows more humans to make use of it. So, an inode number is not very helpful for your typical non-expert. It should be resolved to a filename, and options for dealing with it should be automated and presented in a pleasant way. Like, delete the file / test the problematic block and add it to a badblocks list if it still fails / recover the file from backup, or download and reinstall it from the appropriate package if it's a system file, etc.

That's all beyond the scope of btrfsmaintenance I suppose. But the fact that there's no such infrastructure puts btrfsmaintenance in an awkward position, because btrfsmaintenance is in the job of using approved mechanisms for finding problems, but it doesn't have any good way of actually dealing with them.

montvid commented 3 years ago

With the default it's hard to please everybody. The load caused by scrub has appeared after the CFQ (io scheduler) has been replaced by mq-deadline that does not implement ionice priorities anymore. The BFQ sheduler does, so this can be "fixed" by swtiching to it. Or disable running scrub in the sysconfig and run it manually. On newer kernels it might be possible to confine scrub in a cgroup with all its bells and whistles to limit the resources, this is being evaluated.

Maybe it is a good idea to state that in the readme? i.e. change to bfq scheduler.

montvid commented 3 years ago

Or better still include a script to change to bfq?

ronnystandtke commented 3 years ago

Or better still include a script to change to bfq?

Yes, please! Btrfs scrub is killing my system every month for a whole day. If btrfsmaintenance could do the

modprobe bfq
echo bfq > /sys/block/sdX/queue/scheduler

before starting the btrfs scrub (and maybe resetting the scheduler to the original value afterwards) would be really helpful!

barak commented 3 years ago

Re btrfs scrub killing the system, and dealing with it by enabling bfq or such ... this seems like something the btrfs scrub command should be doing internally. It should have an option, like sudo btrfs scrub --bring-system-to-its-knees=no ... Anything else is just working around the original sin: btrfs scrub itself should be well behaved by default.