sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

Behaviour of `--no-hardlinked` #495

Open SeeSpotRun opened 3 years ago

SeeSpotRun commented 3 years ago

This was discussed earlier in #248 and mostly resolved but I cam across a slight inconsistency which I would like to resolve before implementing a similar interface for --no-reflinks etc (#328).

Firstly the question was asked regarding the usecase of --no-relinks. I didn't respond at the time, but I see the usecase as basically de-cluttering the rmlint output. Consider the testcase:

$ mkdir dir && echo data > dir/same && echo data > dir/file_copy
$ for i in {1..4}; ln dir/file dir/link$i

The default output of rmlint is:

$ rmlint dir
# Duplicate(s):
    ls '<pwd>/dir/file'
    rm '<pwd>/dir/link1'
    rm '<pwd>/dir/link2'
    rm '<pwd>/dir/link3'
    rm '<pwd>/dir/link4'
    rm '<pwd>/dir/same'

Since hardlinks don't take up space (other than dir and inode entries) we have --keep-hardlinked:

$ rmlint dir --keep-hardlinked
# Duplicate(s):
    ls '<pwd>/dir/file'
    ls '<pwd>/dir/link1'
    ls '<pwd>/dir/link2'
    ls '<pwd>/dir/link3'
    ls '<pwd>/dir/link4'
    rm '<pwd>/dir/copy'

But thats a lot of output, so we have --no-hardlinked:

$ rmlint dir --no-hardlinked
# Duplicate(s):
    ls '<pwd>/dir/file'
    rm '<pwd>/dir/copy'

It's the same file deletions as --keep-hardlinked but with more concise output. So far so good.

But...

$ # add some hardlinks of "same":
$ for i in {1..2}; ln dir/same dir/same_link$i

Default behaviour is as expected:

$ rmlint dir
# Duplicate(s):
    ls '<pwd>/dir/file'
    rm '<pwd>/dir/link1'
    rm '<pwd>/dir/link2'
    rm '<pwd>/dir/link3'
    rm '<pwd>/dir/link4'
    rm '<pwd>/dir/same'
    rm '<pwd>/dir/same_link1'
    rm '<pwd>/dir/same_link2'

Also --keep-hardlinked looks ok to me, it preserves any hardlinks of the original and deletes everything else:

$ rmlint dir --keep-hardlinked
# Duplicate(s):
    ls '<pwd>/dir/file'
    ls '<pwd>/dir/link1'
    ls '<pwd>/dir/link2'
    ls '<pwd>/dir/link3'
    ls '<pwd>/dir/link4'
    rm '<pwd>/dir/same'
    rm '<pwd>/dir/same_link1'
    rm '<pwd>/dir/same_link2'

But -no-hardlinked gives this:

$ rmlint dir --no-hardlinked
# Duplicate(s):
    ls '<pwd>/dir/file'
    rm '<pwd>/dir/same'

And if we run the shell script to delete the dupes, we get left with:

$ ls dir
file  link1  link2  link3  link4  same_link1  same_link2
$ rmlint dir --no-hardlinked
# Duplicate(s):
    ls '<pwd>/dir/file'
    rm '<pwd>/dir/same_link1'

It would take 3 successive runs to actually free up any space.

So I'm going to suggest desired behaviour for --no-hardlinked in this case should be:

$ rmlint dir --no-hardlinked
# Duplicate(s):
    ls '<pwd>/dir/file'
    rm '<pwd>/dir/same'
    rm '<pwd>/dir/same_link1'
    rm '<pwd>/dir/same_link2'

So essentially --no-hardlinked is exactly the same behaviour as --keep-hardlinked except that it doesn't print out the hardlinks that are being kept.

Will keep this issue open for a couple of weeks for comment / input. If nothing heard then I'll go ahead as per above.