trapexit / mergerfs-tools

Optional tools to help manage data in a mergerfs pool
ISC License
372 stars 42 forks source link

Balance performance on small files #108

Open progtologist opened 3 years ago

progtologist commented 3 years ago

I am running mergerfs.balance on a filesystem that is comprised of lots of big but even more small files (source code, potentially even compile artifacts). The moment balance starts to move the small files, the whole process turns into an unbelievably slow torture. CPU utilization jumps to 100% and disk I/O becomes almost 0%. It's been 2 days now and barely 10GB have been moved! I switched from cpython to pypy3 to see if that would improve things, I think it slightly did, but not by a huge margin. Is there something that I could do to help this process? Is there some logic that if added to the script, it would improve the performance of small file transfer? E.g. Use some os or shutil to see if a folder is comprised of a large number of small files, then tar them all, move the tar and extract it to the new target? If I implemented something like that and filed a PR would that be of interest to be accepted?

trapexit commented 3 years ago

The tool mostly just walks over the filesystem and calls rsync to copy files (because while I can certainly recreate the behavior of rsync, rsync is well trusted). What kind of system do you have?

Small files will always be higher cost. You could put a file size filter which may help (the paths still all have to be walked.) A better solution, which is planned already, is to decide what to move where all at once, write some temp files, and then use --files-from. That will limit tree walking and rsync execution but is more complicated to do.

progtologist commented 3 years ago

The system is an Intel i7 2760QM (mobile chip) with 16GB of DDR3 RAM. The disks are 8 and 12TB WD ones, low RPM but are capable of 150MB/sec sequencial read/write. They are all connected through a Dell Perc H310 flashed in IT mode. To my understanding, it is not rsync that is causing the slowdown (100% cpu usage), if that were true, I would have seen similar issues with the large files (where rsync is doing all the heavy lifting). So it must be the tree walking in python that is bringing the system to its' knees.

trapexit commented 3 years ago

It blocks on the execution of rsync which should limit CPU usage... I can't even get close to 100% usage if I change rsync to "bash -c true" but maybe my system is just faster. Regardless, a change of what I described is basically a full rewrite.