kdave / btrfsmaintenance

Scripts for btrfs maintenance tasks like periodic scrub, balance, trim or defrag on selected mountpoints or directories.
GNU General Public License v2.0
897 stars 79 forks source link

btrfs-defrag.sh: defrag hunks of multiple files #76

Closed barak closed 3 years ago

barak commented 4 years ago

Instead of invoking "btrfs filesystem defrag" for each individual file/dir, tell find to invoke it with as many at once as practical.

kdave commented 4 years ago

With -f on the commandline there's effectively no difference, as it waits until all the defragmented data from one file are flushed before starting another. So your patch would save forking a new process for each file, I'm not sure this makes a big difference.

Without -f, the amount of data to defragment can increase very quicky, with a big impact on system load.

kdave commented 4 years ago

Right now there's no way to throttle the amount of defragmented data on the submission side, ie. the command. The ioctl implementation in kernel simply has the full information of how how much the file is fragmented and fits the parameters (start, length, target size, ...). We'd need some kind of feedback from kernel to userspace in parallel to the ioctl, which still might be tricky to get right. The simple throttling that -f provides seems to be universal. Possibly the find command can filter files by size and submit them in batches with known total size. Is there a usecase that you were addressing by the change or is it more like style change where the use of '+' is recommended?

barak commented 4 years ago

I was assuming that (a) fewer invocations of btrfs is better, and (b) putting many files in a single invocation would give the underlying layer more leeway to throttle I/O whereas many invocations would result in parallel threads in contention. But as pointed out above, really it's btrfs itself that needs to know how much I/O bandwidth to allocate to defragmentation, and how to keep it from slowing the system to a crawl.