sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

Deleting millions of files is very slow #518

Open ejezel opened 2 years ago

ejezel commented 2 years ago

I wanted to remove all duplicates from two nearly identical folders with millions of files. Using the removal script created by rmlint takes way more time than finding the duplicates. I believe the issue is that the rm process is started individually for each file. Is there some kind of workaround for this?

SeeSpotRun commented 2 years ago

Duplicate directories are probably faster to delete than individual files, you can try rmlint -T dd,df and then running the script. Or you can just let it run overnight...

Not sure why rm is so slow, but others have reported this issue eg https://unix.stackexchange.com/questions/37329/efficiently-delete-large-directory-containing-thousands-of-files. There is some discussion of underlying issues at https://serverfault.com/questions/183821/rm-on-a-directory-with-millions-of-files/328305#328305