sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

--must-match-tagged option seems to be ignored when inputting multiple source directories #620

Open rikrdo89 opened 1 year ago

rikrdo89 commented 1 year ago

When using more than one directory as the source, and setting the option to --must-match-tagged (as below), I found duplicated directories that are marked for removal because they are present in dir1 and dir2. I expected to only remove directories or files that are also duplicated the reference directory.

rmlint -km -g -T "dd,df" ~/dir1 ~/dir2 // ~/reference/dir

Is inputting more than one source directory not supported?

rikrdo89 commented 1 year ago

I think I know what is happening... all the files are matching the reference directory, but the reference directory contains additional files, so I guess the criteria is met, and rmlint is trying to deduplicate at least one of the folders from the source directories... A bit odd because it should remove both since all the data is already in the reference directory IMO.

cebtenzzre commented 1 year ago

Could you provide a little more information to help me reproduce the problem? It would be especially helpful if you could provide a specific example of a directory layout that causes the issue, and which directories you think should be considered duplicates. The way '-km' works with --merge-directories isn't obvious, and most importantly '-m' is applied to individual files before grouping duplicate files into duplicate directories, so I believe it's possible for duplicate directories to appear in the output that don't actually have a tagged match at the directory level.

And --merge-directories never considers directories that only partially match to be duplicates, its only flexibility is layout (--honour-dir-layout is disabled by default).