jRimbault / yadf

Yet Another Dupes Finder
MIT License
51 stars 0 forks source link

Add delete option? #2

Open DarrienG opened 3 years ago

DarrienG commented 3 years ago

Having an all in one binary would be amazing. If this supported deleting all dupes after finding, it would be great.

jRimbault commented 3 years ago

Most [all] of the issues related to that feature would be around UI/UX.

Do I just delete all duplicates ? obviously not, so I have to somehow defer control to the user over what gets deleted or not, there are different ways to go around that. Show each groups of duplicate to the user, let them choose which gets deleted, how do I present them each group ? A group can grow quite large and cumbersome for a human to handle. Just expose a set of options, flags and switches to act as criteria for deletion ? But those would surely be different for each set of duplicate. And then there are the easy technical aspects, do I build an interactive mode into the main tool or do I output a dedicated script like rmlint does ? rmlint's script is my preferred way, I find it quite clever in fact, though I have style issues with the script it outputs.

I haven't thought of a good way to solve all that ? I'm open to bouncing ideas.

DarrienG commented 3 years ago

Honestly if there were just a --delete-all-dupes option without input I would be ok with that. Nice and simple, just delete them all.

maluramichael commented 3 years ago

For my case it would be nice to leave just the oldest one and remove everything else. I try to cleanup up a huge drive full of family photos. They are so heavy cluttered and duplicated. So i would look up the exif create date.

But that is just one case. I would be fine with some kind of interface inside the code. So we can extend the behaviour on our own.

A function which gets a list of the duplicated list and returns a new list with filenames that need to be deleted.

jRimbault commented 3 years ago

Thank you for your feedback, it adds to the list of items I'll keep in mind in the future.

I'm still not sure how to proceed (or if at all) with this feature. I have been thinking (in the back of my mind) about it for quite some time now.

In the meantime would you be able to make this kind of solution work ? Running `yadf path/to/your/files -f ldjson | python_script.py`. Piping the line delimited json output to a python script doing the deletion with your own criteria ? ~Untested~ Tested a bit : ```python #!/usr/bin/env python3 import fileinput import json import os def main(): for line in fileinput.input(): files = json.loads(line) files.sort(key=exifdate) for filename in files[1:]: os.remove(filename) def exifdate(filename): # get the exif date for each file # I don't know how to extract that information with the python stdlib # I'd expect PIL/Pillow has something for that, but it's a third party package return filename if __name__ == "__main__": main() ``` Or a more elaborate script in this [example](https://github.com/jRimbault/yadf/blob/main/examples/keep_oldest.py).
GGG-KILLER commented 2 years ago

In my case I'd prefer hardlinking the duplicate files so only 1 remains on disk.

EDIT: Maybe have a --merge-mode flag? Then have a few options like:

Though this would lead into the issue of "What's 'older' and what's 'newer'?" Do we check creation time, modification time or access time?