sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

--is-reflink gets stuck in holes #529

Closed cebtenzzre closed 1 year ago

cebtenzzre commented 3 years ago

Description

rmlint --is-reflink does redundant work when met with a hole in a file.

Steps to reproduce

Build rmlint from the develop branch with _RM_OFFSET_DEBUG=1 and fixes for #527 and #528 applied. Then run these commands:

$ dd if=/dev/urandom of=foo bs=100K oflag=sync count=1
$ dd if=/dev/urandom of=foo bs=100K oflag=sync count=2 seek=4
$ cp --reflink foo bar
$ filefrag -vb1 foo bar
Filesystem type is: 9123683e
File size of foo is 614400 (614400 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  102399: 699643879424..699643981823: 102400:             shared
   1:   409600..  511999: 700793454592..700793556991: 102400: 699644289024: shared
   2:   512000..  614399: 700861808640..700861911039: 102400: 700793556992: last,shared,eof
foo: 3 extents found
File size of bar is 614400 (614400 blocks of 1 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  102399: 699643879424..699643981823: 102400:             shared
   1:   409600..  511999: 700793454592..700793556991: 102400: 699644289024: shared
   2:   512000..  614399: 700861808640..700861911039: 102400: 700793556992: last,shared,eof
bar: 3 extents found
$ rmlint --is-reflink -vv foo bar
DEBUG: Testing if foo is clone of bar
DEBUG: Checking link type for foo vs bar
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=0
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: Offsets match at fd1=3, fd2=4, logical=0, physical=699643879424
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=204800
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=102400
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=204800
DEBUG: Offsets match at fd1=3, fd2=4, logical=102400, physical=700793454592
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=204800
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=307200
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=204800
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=307200
DEBUG: Offsets match at fd1=3, fd2=4, logical=204800, physical=700793454592
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=307200
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=409600
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=307200
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=409600
DEBUG: Offsets match at fd1=3, fd2=4, logical=307200, physical=700793454592
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=409600
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=512000
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=409600
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=512000
DEBUG: Offsets match at fd1=3, fd2=4, logical=409600, physical=700793454592
DEBUG: rm_offset_get_fiemap: fd=3, n_extents=1, file_offset=512000
DEBUG: rm_offset_get_fiemap: fd=4, n_extents=1, file_offset=512000
DEBUG: Offsets match at fd1=3, fd2=4, logical=512000, physical=700861808640
DEBUG: Files are clones (share same data)
Link type for 'foo' and 'bar', result:
Reflink

Actual result

rmlint checks the physical offset of the second extent (700793454592) four times.

Expected result

rmlint checks each pair of offsets only once, for a total of three physical offset comparisons.

Versions

rmlint version 2.10.1 built from develop commit bdb591f4, with _RM_OFFSET_DEBUG=1 and fixes for #527 and #528 applied.