markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
816 stars 81 forks source link

Strange behaviour with XFS #208

Closed grifferz closed 4 years ago

grifferz commented 6 years ago

Trying out current git head on an XFS filesystem, I see behaviour I don't understand, which doesn't appear to offer much deduplication. Example:

$ sudo dd if=/dev/urandom of=/mnt/reflink/test bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.72869 s, 288 MB/s
$ sudo cp -v --reflink=always /mnt/reflink/test{,_reflink}
'/mnt/reflink/test' -> '/mnt/reflink/test_reflink'
$ sudo cp -v /mnt/reflink/test{,_copy}
'/mnt/reflink/test' -> '/mnt/reflink/test_copy'
$ df -h /mnt/reflink
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/stonevg-reflink   10G  2.2G  7.9G  22% /mnt/reflink
$ sudo filefrag -v /mnt/reflink/test*
Filesystem type is: 58465342
File size of /mnt/reflink/test is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  162684:     163850..    326534: 162685:             shared
   1:   162685..  262143:     327808..    427266:  99459:     326535: last,shared,eof
/mnt/reflink/test: 2 extents found
File size of /mnt/reflink/test_copy is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  162684:     491530..    654214: 162685:            
   1:   162685..  262143:     655488..    754946:  99459:     654215: last,eof
/mnt/reflink/test_copy: 2 extents found
File size of /mnt/reflink/test_reflink is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  162684:     163850..    326534: 162685:             shared
   1:   162685..  262143:     327808..    427266:  99459:     326535: last,shared,eof
/mnt/reflink/test_reflink: 2 extents found
$ sudo ./duperemove -hdr --hashfile /var/lib/duperemove.sqlite /mnt/reflink
Gathering file list...
Using 8 threads for file hashing phase
[1/3] (33.33%) csum: /mnt/reflink/test
[2/3] (66.67%) csum: /mnt/reflink/test_reflink
[3/3] (100.00%) csum: /mnt/reflink/test_copy
Total files:  3
Total extent hashes: 3
Loading only duplicated hashes from hashfile.
Simple read and compare of file data found 1 instances of extents that might benefit from deduplication.
Showing 3 identical extents of length 116.0K with id 1af555b0
Start           Filename
0.0     "/mnt/reflink/test_copy"
0.0     "/mnt/reflink/test"
0.0     "/mnt/reflink/test_reflink"
Using 8 threads for dedupe phase
[0x55d57da35ca0] (1/1) Try to dedupe extents with id 1af555b0
[0x55d57da35ca0] Dedupe 1 extents (id: 1af555b0) with target: (0.0, 116.0K), "/mnt/reflink/test_copy"
Comparison of extent info shows a net change in shared extents of: 116.0K
$ df -h /mnt/reflink
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/stonevg-reflink   10G  2.2G  7.9G  22% /mnt/reflink
$ sudo filefrag -v /mnt/reflink/test*
Filesystem type is: 58465342
File size of /mnt/reflink/test is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      28:     491530..    491558:     29:             shared
   1:       29..  162684:     163879..    326534: 162656:     491559: shared
   2:   162685..  262143:     327808..    427266:  99459:     326535: last,shared,eof
/mnt/reflink/test: 3 extents found
File size of /mnt/reflink/test_copy is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      28:     491530..    491558:     29:             shared
   1:       29..  162684:     491559..    654214: 162656:            
   2:   162685..  262143:     655488..    754946:  99459:     654215: last,eof
/mnt/reflink/test_copy: 2 extents found
File size of /mnt/reflink/test_reflink is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      28:     163850..    163878:     29:            
   1:       29..  162684:     163879..    326534: 162656:             shared
   2:   162685..  262143:     327808..    427266:  99459:     326535: last,shared,eof
/mnt/reflink/test_reflink: 2 extents found

I had previously been using v0.11 and things do still work there:

$ sudo rm /var/lib/duperemove.sqlite
$ sudo ../duperemove-0.11/duperemove -hdr --hashfile /var/lib/duperemove.sqlite /mnt/
reflink
Using 128K blocks
Using hash: murmur3
Gathering file list...
Adding files from database for hashing.
Loading only duplicated hashes from hashfile.
Using 8 threads for dedupe phase
[0x562f03f27e30] (0001/8192) Try to dedupe extents with id fffdf8be
[0x562f03f27d40] (0003/8192) Try to dedupe extents with id fff6f025
[0x562f03f27cf0] (0002/8192) Try to dedupe extents with id fff7dcea
[0x562f03f27c00] (0004/8192) Try to dedupe extents with id fff36ace
[0x562f03f27ca0] (0005/8192) Try to dedupe extents with id ffee4889
[0x562f03f27c50] (0006/8192) Try to dedupe extents with id ffeb6028
[0x562f03f27d90] (0007/8192) Try to dedupe extents with id ffe1f840
[0x562f03f27de0] (0008/8192) Try to dedupe extents with id ffdac66e
[0x562f03f27c50] Dedupe 2 extents (id: ffeb6028) with target: (950.5M, 128.0K), "/mnt/reflink/test_copy"
[0x562f03f27c00] Dedupe 2 extents (id: fff36ace) with target: (847.4M, 128.0K), "/mnt/reflink/test_copy"
[0x562f03f27e30] Dedupe 2 extents (id: fffdf8be) with target: (814.8M, 128.0K), "/mnt/reflink/test_copy"
.
.
[0x562f03f27cf0] (8189/8192) Try to dedupe extents with id 001d9ef9
[0x562f03f27cf0] Dedupe 2 extents (id: 001d9ef9) with target: (953.4M, 128.0K), "/mnt/reflink/test"
Kernel processed data (excludes target files): 4.0G
Comparison of extent info shows a net change in shared extents of: 944.0M
$ df -h /mnt/reflinkFilesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/stonevg-reflink   10G  1.2G  8.9G  12% /mnt/reflink
$ sudo filefrag -v /mnt/reflink/test*Filesystem type is: 58465342
File size of /mnt/reflink/test is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  162684:     491530..    654214: 162685:             shared
   1:   162685..  262143:     655488..    754946:  99459:     654215: last,shared,eof
/mnt/reflink/test: 2 extents found
File size of /mnt/reflink/test_copy is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  162684:     491530..    654214: 162685:             shared
   1:   162685..  262143:     655488..    754946:  99459:     654215: last,shared,eof
/mnt/reflink/test_copy: 2 extents found
File size of /mnt/reflink/test_reflink is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  162684:     491530..    654214: 162685:             shared
   1:   162685..  262143:     655488..    754946:  99459:     654215: last,shared,eof
/mnt/reflink/test_reflink: 2 extents found

Am I doing something wrong with the git head version?

Also, is there a preferred place to ask usage questions which aren't bugs or feature requests? I have read the manual page and the FAQ but still have some questions.

Thanks!

lorddoskias commented 4 years ago

There was a bug in git master which caused the file scanning to only scan the first extent of the file. Can you retest with latest master.

grifferz commented 4 years ago

Yes, working again now, thanks.

lorddoskias commented 4 years ago

Ok, closing, should you have more problems don't hesitate to open another issue.