Open MaxRower opened 1 year ago
totally agree..
I'm running duperemove on a 16 tb ssd volume right now and it's stuck at this 15 gb file for a couple days now:
duperemove -drh --io-threads=8 --cpu-threads=8 -b 256k --dedupe-options=partial --hashfile=/mnt/spark/dupehash --exclude "/mnt/spark/dupehash*" /mnt/spark
this is a raid 0 with two 8 tb ssd's:
resulting in this volume:
apparently running duperemove on this also increased recorded read sectors by a lot.
before it started, they were pretty equal at around 30tb for each ssd so.. duperemove read a TON of data to dedupe one file, while being only 1/7 through.
this is 20 minutes later:
so I assume duperemove is reading file metadata over and over and over after each dedupe operation, even though it could technically chain the operations together in one go.
at least filefrag is confirming, that it's chopping down the file more and more:
filefrag -v idontknow.img
this is just one of six million files..
at least read rate is consistent: and as you can see, the drive can do a lot more than that.
iops went up a lot tho:
edit a month later:
Is there a reason why the smallest chunks get deduplicated first? Sometimes, there is limited time for deduplication, and it would be nice if it was possible to deduplicate the largest chunks with the biggest impact on freespace first.