jvirkki / dupd

CLI utility to find duplicate files
http://www.virkki.com/dupd
GNU General Public License v3.0
112 stars 16 forks source link

No way for reports to ignore hard linked file pairs #7

Closed jbruchon closed 7 years ago

jbruchon commented 8 years ago

There is no way to specify when reporting that hard linked file pairs are to be treated as non-duplicates. When working with huge data sets that are not modified but only added to and where all exact duplicates need to still appear in their original locations, hard linking is extremely useful, but dupd will report the hard links as duplicates, potentially triggering unnecessary re-linking and/or deletion of directory entries that aren't actually consuming extra disk space.

jvirkki commented 8 years ago

Yes this is a known limitation. An artifact of having written dupd for my own needs, since I never use hard links ;-)

I've toyed with tracking inodes which would allow handling this but not done currently.

jvirkki commented 7 years ago

Added an option to ignore subsequent names if the file (inode) has been seen already. This can be done either during scan or with the interactive commands.