sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

--merge-directories with --followlinks mishandles path doubles #586

Open cebtenzzre opened 1 year ago

cebtenzzre commented 1 year ago
$ mkdir dir dir_bind
$ sudo mount --bind dir dir_bind
$ mkdir dir/a dir/b
$ echo xxx >dir/a/z
$ echo xxx >dir/b/z
$ ln -s ../../dir_bind/a/z dir/a/x
$ ln -s ../../dir_bind/b/z dir/b/x
$ tree
.
├── dir
│   ├── a
│   │   ├── x -> ../../dir_bind/a/z
│   │   └── z
│   └── b
│       ├── x -> ../../dir_bind/b/z
│       └── z
└── dir_bind
    ├── a
    │   ├── x -> ../../dir_bind/a/z
    │   └── z
    └── b
        ├── x -> ../../dir_bind/b/z
        └── z

6 directories, 8 files
$ rmlint dir -D -S a -f -o pretty -o summary
Empty directory or weird RmFile encountered; rejecting.

# Duplicate(s):
    ls '/tmp/rmlint-test/dir/a/z'
    rm '/tmp/rmlint-test/dir_bind/b/z'

==> In total 4 files, whereof 1 are duplicates in 1 groups.
==> This equals 4 B of duplicates which could be removed.
==> Scanning took in total 0.104s.

I would expect rmlint to recognize that dir/a and dir/b are duplicates, like this:

# Duplicate Directorie(s):
    ls -la '/tmp/rmlint-test/dir/a'
    rm -rf '/tmp/rmlint-test/dir/b'

I discovered this on Cygwin, where you can get a path double by making a symlink (with CYGWIN=winsymlinks:nativestrict) - the resolved target is an absolute UNIX path /tmp/... while the path supplied to rmlint may start from a cygdrive, /c/msys64/tmp/....

cebtenzzre commented 1 year ago

rmlint -Df prints this message only when a symlink points outside of the traversed paths, but not in all cases. It's usually harmless. For example:

$ mkdir a b
$ echo xxx >a/z
$ echo xxx >b/z
$ ln -s ../b/z a/x
$ tree
.
├── a
│   ├── x -> ../b/z
│   └── z
└── b
    └── z

2 directories, 3 files
$ rmlint a -D -S a -f -o summary -o pretty
Empty directory or weird RmFile encountered; rejecting.

# Duplicate(s):
    ls '/tmp/rmlint-test/b/z'
    rm '/tmp/rmlint-test/a/z'

==> In total 2 files, whereof 1 are duplicates in 1 groups.
==> This equals 4 B of duplicates which could be removed.
==> Scanning took in total 0.104s.
cebtenzzre commented 1 year ago

IMO, both --followlinks and --see-symlinks should be used carefully. Passing --no-followlinks by default is a good habit.

--followlinks is at best a way to include files and directories you didn't explicitly pass to rmlint, and at worst will make --merge-directories spit out these "weird RmFile" errors or completely ignore symlinks.

$ mkdir -p dir/a dir/b
$ echo xxx >dir/a/x
$ echo xxx >dir/b/x
$ echo yyy >dir/a/y
$ echo yyy >dir/b/y
$ echo foo >foo
$ ln -s ../../foo dir/a/foo
$ tree
.
├── dir
│   ├── a
│   │   ├── foo -> ../../foo
│   │   ├── x
│   │   └── y
│   └── b
│       ├── x
│       └── y
└── foo

3 directories, 6 files
$ rmlint dir -T dd -o pretty -F
$ rmlint dir -T dd -o pretty -@
$ rmlint dir -T dd -o pretty -f

# Duplicate Directorie(s):
    ls -la '/tmp/rmlint-test/dir/b'
    rm -rf '/tmp/rmlint-test/dir/a'


--see-symlinks is at best a way to find symlinks that actually point to the same location and a hack to get --merge-directories to care about symlinks, and at worst will delete completely unrelated symlinks and files.

$ tree
.
├── a
│   ├── c
│   └── y -> c
├── c
├── x -> c
└── z

1 directory, 5 files
$ cat x
xxx
$ cat a/y
yyy
$ cat z; echo
c
$ rmlint -o pretty -@

# Duplicate(s):
    ls '/tmp/rmlint-test/z'
    rm '/tmp/rmlint-test/x'
    rm '/tmp/rmlint-test/a/y'