Open ytrezq opened 8 years ago
Hmm, we should be printing out total dedupe requests and which request is being currently processed. To be fair, that number is a bit wonky though because we might break a large request up into multiple ones.
Generally though, I have a feeling that you're hitting this issue:
https://github.com/markfasheh/duperemove/issues/156
Does performance improve when you run with --dedupe-options=nofiemap
?
We need an FAQ entry for this at least so I'll work something up when I get a chance. I don't want to disable fiemap during dedupe because it has the downside of disabling our space savings estimate.
EDIT: FYI, you wanted --io-threads. Perhaps I could be more clear in the man page, --cpu-threads only affects the optional find-dupes stage.
@markfasheh : I used --io-thread=8
. I got 8 thread but only 3 were running alternatively.
Does performance improve when you run with --dedupe-options=nofiemap ?
Unfortunately, the duperemove process disliked receiving mySIGKILL
.
So the filesystem is damaged and can’t be fixed because of that brtrfsck bug. Of course, this prevent me running deupermove again safely.
Generally though, I have a feeling that you're hitting this issue: #156
No, because I don’t have 2 identical files larger than 10 Mb. However, the duperemove process was telling it was deduping only one (the same) file since the beginning. That file was 170Gb large and full of zeros (I was able to delete it before shutting down the machine because of duperemove).
Here’s the capture from btrfs-image -w -c 0 -t 8 /dev/dm-7
: https://web.archive.org/web/20161020220914/https://filebin.net/7ni8kfpog1dxw4jc/btrfs-image_capture.xz
I ran duperemove on a 298Gb volume containing about 200 000 files. One of them is 170Gb large full of zeros which I used for finding cve‑2016‑2315.
I wanted to reduce it’s size with
--dedupe-options=same
. After 11h hours I computed with gdb that it would require at least 35 000 000 of seconds in order to finish (405 days) because it was deduping 4096 bytes of that file every 0.7 seconds.The strange thing was also the process pool was also using only 3 threads even with the
--cpu-threads=5
option (I have a 4 core hyperthreaded processor) among 8 threads.Please include an automatic version of estimated time to finish in duperemove.