feature request: process largest chunks first

totally agree..

I'm running duperemove on a 16 tb ssd volume right now and it's stuck at this 15 gb file for a couple days now: duperemove -drh --io-threads=8 --cpu-threads=8 -b 256k --dedupe-options=partial --hashfile=/mnt/spark/dupehash --exclude "/mnt/spark/dupehash*" /mnt/spark this is a raid 0 with two 8 tb ssd's: resulting in this volume: apparently running duperemove on this also increased recorded read sectors by a lot. before it started, they were pretty equal at around 30tb for each ssd so.. duperemove read a TON of data to dedupe one file, while being only 1/7 through. this is 20 minutes later: so I assume duperemove is reading file metadata over and over and over after each dedupe operation, even though it could technically chain the operations together in one go.

at least filefrag is confirming, that it's chopping down the file more and more: filefrag -v idontknow.img

this is just one of six million files..

at least read rate is consistent: graph and as you can see, the drive can do a lot more than that.

iops went up a lot tho: graph2

edit a month later:

markfasheh / duperemove

feature request: process largest chunks first #292