markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

feature request: display total space savings #268

Closed khimaros closed 2 years ago

khimaros commented 3 years ago

i'm running the following to dry-run deduplication on some virtual machine images:

# duperemove -Arh --dedupe-options=same --hashfile=duperemove.hashes /var/lib/libvirt/images/

duperemove displays piecemeal information about how much data is saved in different lengths:

Showing 3 identical extents of length 52.0M with id 03df3e0a
Showing 3 identical extents of length 128.0M with id 86919a3b

it would be helpful to accumulate all of these numbers into a single report of saved space.

lorddoskias commented 3 years ago

Such information is currently printed only when actual dedupe is run, in particular:

Showing 2 identical extents of length 4.0M with id c18ad329
Start       Filename
8.0M    "/home/nborisov/projects/kernel/duperemove/mnt-test/test-file"
8.0M    "/home/nborisov/projects/kernel/duperemove/mnt-test/test-file-1"
Showing 2 identical extents of length 4.0M with id d226c865
Start       Filename
0.0 "/home/nborisov/projects/kernel/duperemove/mnt-test/test-file"
0.0 "/home/nborisov/projects/kernel/duperemove/mnt-test/test-file-1"
Using 24 threads for dedupe phase
[0x562fcd275060] (1/2) Try to dedupe extents with id c18ad329
[0x562fcd275060] Dedupe 1 extents (id: c18ad329) with target: (8.0M, 4.0M), "/home/nborisov/projects/kernel/duperemove/mnt-test/test-file"
[0x562fcd2750c0] (2/2) Try to dedupe extents with id d226c865
[0x562fcd2750c0] Dedupe 1 extents (id: d226c865) with target: (0.0, 4.0M), "/home/nborisov/projects/kernel/duperemove/mnt-test/test-file"
Comparison of extent info shows a net change in shared extents of: 8.0M

In particular, the last line shows that we have saved 8 megabytes. But for this output to be produced one needs to run duperemove with -d e.g. actual dedupe.

l29ah commented 2 years ago

Even if an actual dedupe is ran, the savings only show up with verbose printing apparently, and the totals reported afterwards are cryptic:

Sep 16 04:41:50 [duperemove] Simple read and compare of file data found 8713 instances of extents that might benefit from deduplication.
Sep 16 04:41:50 [duperemove] Comparison of extent info shows a net change in shared extents of: 10390905271
l29ah commented 2 years ago

Okay, apparently if one adds -h for a human-readable output, the result looks much more appealing, and this issue is void:

Simple read and compare of file data found 8678 instances of extents that might benefit from deduplication.
Comparison of extent info shows a net change in shared extents of: 9.7G

The only thing to be desired is s,G,GB,

lorddoskias commented 2 years ago

Okay, apparently if one adds -h for a human-readable output, the result looks much more appealing, and this issue is void:

Simple read and compare of file data found 8678 instances of extents that might benefit from deduplication.
Comparison of extent info shows a net change in shared extents of: 9.7G

The only thing to be desired is s,G,GB,

Is really adding 1 letter so critical to understanding the output?