pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.92k stars 71 forks source link

Add directory clone detection #51

Open nicolasboulay opened 3 years ago

nicolasboulay commented 3 years ago

Sometime i keep 2 times the content of a CF card full of video or photos.

It will be great to have a detection of directory which all content are already present somewhere else.

felciano commented 1 year ago

Agreed -- this would be a massive convenience for deduplication efforts.

ashaikh commented 1 year ago

This would be great to have and a feature to merge very similar directories. Your tool is fantastic, fastest I have ever seen and I have tried most of them on mac, windows, linux.

For features you may want to look at Duplicate File Finder for Mac, that has the most usable set of features to combine and eliminate duplicates while doing a system cleanup. It is much slower than yours but it lets you clean things up very intuitively.

shakfu commented 1 year ago

Totally agree. I also had duplicated folders which i had to figure on a file by file basis.

I guess after all files checks are completed, it would involve sorting by path and then a recursive comparison of hashes to determine which directories are duplicate.

JohniFi commented 3 weeks ago

First of all, great software, thanks!

This would indeed be very helpful with big collections (I just did a scan for a friend of mine: "Total: 5288645275774 B (5.3 TB) in 919622 files in 238361 groups"). In this case I won't just let fclones automatically delete thousands of files but would rather like to find the biggest "whole" folders that match and delete them manually in the file explorer after manual revision to be sure to not destroy anything (e.g. delete files from video/music software project folders, that would corrupt the project).

Idea: I guess for this kind of tool you have to scan for duplicates and uniques, maybe store the folder structure in a BTree and then find folders that have redundant content without unique content and then traversing upwards in the BTree as much as possible to find the biggest folders that match.