jvirkki / dupd

CLI utility to find duplicate files
http://www.virkki.com/dupd
GNU General Public License v3.0
112 stars 16 forks source link

Reporting duplicate directories #15

Closed ivanperez-keera closed 6 years ago

ivanperez-keera commented 6 years ago

An advanced feature I've been missing in these duplicate finding programs (fdupes, jdupes, and I think dupd) is finding duplicate directories.

Often, when backups are desynced or when things are moved around, one would like to know if a whole directory is the same as some other directory elsewhere.

I suspect this might be possible by "just" creating a hash for each directory, from the list of all the hashes of everything inside, sorted alphabetically, and built recursively bottom-up.

It might be possible for two directories to be duplicate if they contain the same contents, or if they have the same content, with the same file-names for it.

I suspect something smart would need to be done to resolve cycles, symlinks. Also, discarding empty directories might be useful.

jvirkki commented 6 years ago

You can get this information with dupd using the uniques command.

uniques will report all the unique files in a given directory tree. If there are none, that means everything in that directory (tree) is duplication.

Quick example:

% ls
dir1  dir2  dir3

dir1 has 3 files, dir2 has the exact same 3 files.

% dupd scan -q
% cd dir2
% dupd uniques

The output of dupd uniques is empty, in other words there is nothing unique within dir2.

You'll probably want to know where the other copies of these files are if you organizing/consolidating. The complementary command to 'uniques' is 'dups', it'll list you all the duplicate files. In the case of dir2 that's all of them, but more interesting is run dups -v to get pointers to where these duplicates live:

% dupd dups -v
/tmp/files/dir2/hi
             DUP: /tmp/files/dir1/hi
             DUP: /tmp/files/dir1/hey
             DUP: /tmp/files/dir3/hi
             DUP: /tmp/files/dir2/hey
/tmp/files/dir2/hey
             DUP: /tmp/files/dir1/hi
             DUP: /tmp/files/dir1/hey
             DUP: /tmp/files/dir3/hi
             DUP: /tmp/files/dir2/hi
/tmp/files/dir2/hello
             DUP: /tmp/files/dir1/hello

This document: https://github.com/jvirkki/dupd/blob/master/docs/examples.md gives some other examples on interactively exploring the files with these commands. Could probably use more examples but hopefully it helps.

ivanperez-keera commented 6 years ago

Cool. Thanks for taking the time to explain it (again).

jvirkki commented 6 years ago

Cool I hope it covers the needs. If anything comes up that could be better, file an issue. Thanks.