sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

rm_offset_get_from_fd merges logically separate extents #530

Closed cebtenzzre closed 1 year ago

cebtenzzre commented 2 years ago

Description

When extents are physically adjacent, rmlint assumes they can safely be merged, even if they are not logically adjacent.

Steps to reproduce

Build rmlint from the develop branch with _RM_OFFSET_DEBUG=1 and fixes for #527, #528, and #529 applied. Then run these commands:

$ dd if=/dev/urandom of=foo_src bs=200K oflag=sync count=1
$ xfs_io -f -c 'copy_range -l 100K foo_src' -c 'fsync' foo
$ xfs_io -f -c 'copy_range -l 100K -s 100K -d 200K foo_src' -c 'fsync' foo
$ cp --reflink foo bar
$ filefrag -vb1 foo bar
Filesystem type is: 9123683e
File size of foo is 307200 (307200 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  102399: 734768922624..734769025023: 102400:             shared
   1:   204800..  307199: 734769025024..734769127423: 102400:             last,shared,eof
foo: 1 extent found
File size of bar is 307200 (307200 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  102399: 734768922624..734769025023: 102400:             shared
   1:   204800..  307199: 734769025024..734769127423: 102400:             last,shared,eof
bar: 1 extent found
$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if foo is clone of bar
DEBUG: Checking link type for foo vs bar
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=0, physical=734768922624
DEBUG: Files are clones (share same data)
Link type for 'foo' and 'bar', result:
Reflink

Actual result

rmlint only checks one pair of physical offsets, which is probably harmless to --is-reflink but indicates that rm_offset_get_fiemap is merging the representation of these two extents, which is confusing and incorrect (and can cause --is-reflink false positives).

Expected result

$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if foo is clone of bar
DEBUG: Checking link type for foo vs bar
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=0, physical=734768922624
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=102400, physical=734769025024
DEBUG: Files are clones (share same data)
Link type for 'foo' and 'bar', result:
Reflink

Note that rmlint reports two matching offsets, just like how filefrag reports two extents.

Versions

rmlint version 2.10.1 built from develop commit bdb591f4, with _RM_OFFSET_DEBUG=1 and fixes for #527, #528, and #529 applied.