Open DarrienG opened 3 years ago
Most [all] of the issues related to that feature would be around UI/UX.
Do I just delete all duplicates ? obviously not, so I have to somehow defer control to the user over what gets deleted or not, there are different ways to go around that. Show each groups of duplicate to the user, let them choose which gets deleted, how do I present them each group ? A group can grow quite large and cumbersome for a human to handle. Just expose a set of options, flags and switches to act as criteria for deletion ? But those would surely be different for each set of duplicate.
And then there are the easy technical aspects, do I build an interactive mode into the main tool or do I output a dedicated script like rmlint
does ? rmlint
's script is my preferred way, I find it quite clever in fact, though I have style issues with the script it outputs.
I haven't thought of a good way to solve all that ? I'm open to bouncing ideas.
Honestly if there were just a --delete-all-dupes
option without input I would be ok with that. Nice and simple, just delete them all.
For my case it would be nice to leave just the oldest one and remove everything else. I try to cleanup up a huge drive full of family photos. They are so heavy cluttered and duplicated. So i would look up the exif create date.
But that is just one case. I would be fine with some kind of interface inside the code. So we can extend the behaviour on our own.
A function which gets a list of the duplicated list and returns a new list with filenames that need to be deleted.
Thank you for your feedback, it adds to the list of items I'll keep in mind in the future.
I'm still not sure how to proceed (or if at all) with this feature. I have been thinking (in the back of my mind) about it for quite some time now.
In my case I'd prefer hardlinking the duplicate files so only 1 remains on disk.
EDIT: Maybe have a --merge-mode
flag?
Then have a few options like:
delete-older
delete-newer
hardlink-older
hardlink-newer
softlink-older
softlink-newer
Though this would lead into the issue of "What's 'older' and what's 'newer'?" Do we check creation time, modification time or access time?
Having an all in one binary would be amazing. If this supported deleting all dupes after finding, it would be great.