pixelb / fslint

Linux file system lint checker/cleaner
320 stars 72 forks source link

Enhancement: dupe-deletion with a set of triggers #56

Open pixelb opened 9 years ago

pixelb commented 9 years ago

Original issue 57 created by pixelb on 2010-10-24T11:45:19.000Z:

What steps will reproduce the problem?

  1. Just perform any dupe-search with the intention of deleting dupes (don't trigger deletion)
  2. When the result is complete, there is no way to influence with options which files should be deleted (unless manual)
  3. This applies to GUI and command-line version of "findup".

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system? official: 2.26 root@morpheus:/usr/share/fslint/fslint# uname -a Linux morpheus 2.6.31-9-rt # 152-Ubuntu SMP PREEMPT RT Thu Oct 15 05:01:14 UTC 2009 i686 GNU/Linux

Please provide any additional information below.

At this time, there is little the user can do to influence which dupes will be deleted. A set of options could be implemented to give the user more control which files should be deleted.

To explain these options, I would like to create the following -- very plausible and near real-life -- scenario. (The scenario can be changed to match other requirements in the future.)

Scenario: User John Doe is collecting images of nature. He uses an automated news-robot to extract images from a usenet-news-feed. It is within the nature of these usenet groups that images are reposted over time. Sometimes they have a different name, sometimes they have the same name. Most newsgroup-robots will rename them with a certain algorithm. For example, if "tree.jpg" exists, then the next image called "tree.jpg" would be renamed to "tree(1).jpg", then "tree(2).jpg" etc. John has moved his collection of images that he wants to keep to "~/keep", and the new incoming images go to "~/incoming".

From this, users might want to have the choice to delete (or not delete) by the following options:

  1. Never delete anything from a certain path - the master-path (here "~/keep")
  2. Delete by path (here "~/incoming")
  3. Delete by filename (length of filename - longer or shorter)
  4. Delete by filename-pattern (delete "*(digit).jpg")
  5. Delete by date (keep oldest or keep newest)
  6. Delete by Case (keep uppercase or keep lowercase)

With applying these options one has to recognize that not all duplicate-grous will match the deletion-criteria all the time. Thus, the user will have to decide what to do in such case: A. Mark-All-Groups (don't leave any duplicate group unmarked) B. Mark-Matching-Groups-Only (leave those duplicate groups unmarked that do not match any deletion criteria)

pixelb commented 9 years ago

Comment #1 originally posted by pixelb on 2010-10-25T23:07:55.000Z:

1 & 2 should be handled by wildcard selection (globbing) 5 is already handled as a selection option

3, 4, 6 would need regex support. As a stop gap one can save the list and operate on it with sed/grep/ et. al.

As for A. B., fslint currently does B.