sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

--is-reflink false positive: rm_util_link_type can miss holes before extents #611

Closed cebtenzzre closed 1 year ago

cebtenzzre commented 1 year ago

Description

rm_offset_get_from_fd quietly skips over holes until it finds an extent, so unless there are other differences between the files rmlint will not notice them.

Steps to reproduce

Build rmlint from the master branch with _RM_OFFSET_DEBUG=1 and fixes for #527, #528, and #529 applied. Then run these commands:

$ dd if=/dev/urandom of=foo bs=200K oflag=sync count=2
$ truncate -s 400K bar
$ xfs_io -f -c 'copy_range -l 196K -d 4K foo' -c 'fsync' bar
$ xfs_io -f -c 'copy_range -l 200K -s 200K -d 200K foo' -c 'fsync' bar
$ filefrag -vb1 foo bar
Filesystem type is: 9123683e
File size of foo is 409600 (409600 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  204799: 1387178913792..1387179118591: 204800:             shared
   1:   204800..  409599: 1399459041280..1399459246079: 204800: 1387179118592: last,shared,eof
foo: 2 extents found
File size of bar is 409600 (409600 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:     4096..  204799: 1387178913792..1387179114495: 200704:       4096: shared
   1:   204800..  409599: 1399459041280..1399459246079: 204800: 1387179114496: last,shared,eof
bar: 2 extents found
$ cmp foo bar
foo bar differ: byte 1, line 1
$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if /home/cebtenzzre/bar is clone of /home/cebtenzzre/foo
DEBUG: Checking link type for /home/cebtenzzre/bar vs /home/cebtenzzre/foo
DEBUG: rm_offset_get_fiemap: fd=5, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=5, n_extents=1, file_offset=204800
DEBUG: rm_offset_get_fiemap: fd=6, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=6, n_extents=1, file_offset=204800
DEBUG: Offsets match at fd1=5, fd2=6, logical=0, physical=1387178913792
DEBUG: rm_offset_get_fiemap: fd=5, n_extents=1, file_offset=204800
DEBUG: rm_offset_get_fiemap: fd=6, n_extents=1, file_offset=204800
DEBUG: Offsets match at fd1=5, fd2=6, logical=204800, physical=1399459041280
DEBUG: Files are clones (share same data)
DEBUG: Offsets match
$ echo $?
0

Actual result

--is-reflink exits with status 0 (Files are reflinked) despite cmp clearly showing that the files have different content. Note that according to filefrag the physical offsets match and the extents logically end at the same place. Since those are the only things rmlint checks for, it thinks the files are reflinked.

Expected result

I would expect rmlint to report that the files are not reflinked - they have different content, after all.

$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if /home/cebtenzzre/bar is clone of /home/cebtenzzre/foo
DEBUG: Checking link type for /home/cebtenzzre/bar vs /home/cebtenzzre/foo
DEBUG: rm_offset_get_fiemap: fd=5, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=5, n_extents=1, file_offset=204800
DEBUG: rm_offset_get_fiemap: fd=6, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=6, n_extents=1, file_offset=204800
DEBUG: Logical offsets differ after 0 bytes: 4096<> 0
DEBUG: Offsets differ
$ echo $?
1