pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.91k stars 71 forks source link

Identical files under the same root are returned despite `--isolate` option #91

Closed chenxiaolong closed 2 years ago

chenxiaolong commented 2 years ago

When running fclones group -I, it seems to be finding duplicate files underneath the same root (path argument on command line). For example, if I construct a tree like:

echo hi > source.txt
mkdir -p {a,b}/{1,2}
for i in {a,b}/{1,2}/test; do cp source.txt "${i}"; done

and then run fclones group -I a b, I get:

[2021-11-17 16:08:01.482] fclones:  info: Started grouping
[2021-11-17 16:08:02.019] fclones:  info: Scanned 10 file entries
[2021-11-17 16:08:02.019] fclones:  info: Found 4 (12 B) files matching selection criteria
[2021-11-17 16:08:02.019] fclones:  info: Found 3 (9 B) candidates after grouping by size
[2021-11-17 16:08:02.019] fclones:  info: Found 3 (9 B) candidates after grouping by paths and file identifiers
[2021-11-17 16:08:02.033] fclones:  info: Found 3 (9 B) candidates after grouping by prefix
[2021-11-17 16:08:02.033] fclones:  info: Found 3 (9 B) candidates after grouping by suffix
[2021-11-17 16:08:02.034] fclones:  info: Found 3 (9 B) redundant files
# Report by fclones 0.17.1
# Timestamp: 2021-11-17 16:08:02.036 -0500
# Command: fclones group -I a b
# Found 1 file groups
# 9 B (9 B) in 3 redundant files can be removed
e872d4a1bdc12e1262820a95eebb530a, 3 B (3 B) * 4:
    /tmp/tree/a/1/test
    /tmp/tree/a/2/test
    /tmp/tree/b/1/test
    /tmp/tree/b/2/test
pkolaczk commented 2 years ago

This works as designed (and as documented). Duplicates under the same root are all counted as 1, but not filtered out. Here you have 2 roots and because identical files are under either of them, fclones assumes number of replicas to be equal 2 (without -I that would be 4). Because 2 is still greater than 1, they are reported.

And it reports all of them because it has no idea which ones are more important for you - so it lets you choose later (you can edit these output files before further processing).

chenxiaolong commented 2 years ago

Thanks for the information. I was thrown off based on the --help output:

Don't count matching files found within the same directory argument as duplicates

but what you described makes perfect sense (especially given fclones' separate group phase compared to other one-shot tools).