sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.87k stars 130 forks source link

Feature request: Ability to specify how the "original" file is determined #45

Closed stevenhoneyman closed 10 years ago

stevenhoneyman commented 10 years ago

I saw issue #10 but it seems to be closed.

I just found rmlint through a forum post, tried it out on a few files I had lying around, and made 2 dupes with different names - it did not detect the (genuine) original, it chose the one with the lowest inode.

fdupes chose the right one as the original :)

It should/could choose the original based on the user req (oldest mod date, alphabetically, etc) - but process this after the whole program is finished searching. That way, you'd avoid the "spawn process everytime a dupe is found" issue mentionned in #10 I think.

Brottweiler commented 10 years ago

You can already do this, but it's not a good way IMO. But it works. Should be improved.

Let's say you have a folder, and inside of it you have TWO folders. You want to compare the two folders, and delete the dupes in the second folder. So you run

rmlint '//originalFolder' 'duplicateFolder'

originalFolder is the folder with the original files, that you want to look for inside duplicateFolder. duplicateFolder contains the dupelicate files that you want to delete.

SeeSpotRun commented 10 years ago

So something like: -D --sortcriteria when selecting original, sort in order of : m=keep lowest mtime (oldest), M=keep newest mtime, a=keep first in alphabetical order, A=keep last in alphabetical order, combinations treated in turn, eg "-D am" would choose first in alphabetical order, and in the event that there are more than one files with same name would keep the oldest.

Anything else apart from mtime and alphabetical that anyone would realistically want?

Brottweiler commented 10 years ago

IMO, the first folder you specify should be considered original. It should just have to be that simple.

stevenhoneyman commented 10 years ago

What if your files you want dupe checking are in the same directory? Mine always are.

@SeeSpotRun that would be ideal!

Brottweiler commented 10 years ago

Sure, should be a simplier or better way of specifying the original directory, rather than using two slashes.

SeeSpotRun commented 10 years ago

We can have all of the above, ie: mM (modification time) aA (alphabetical) p = if more that one path given in rmlint commant, keep the first one P = if more that one path given in rmlint commant, keep the later one so "-D pam" would prioritise firstly on first-named path, then alphabetical on basename, then if still tied go to mtime. Note that paths tagged with // will still take precedence over the above, ie if one or more paths are prefixed with // then "-D m" would keep the oldest copy in the // path(s) as the original.

If you want to try it out, try https://github.com/SeeSpotRun/rmlint/tree/devel. Note this includes some other changes I've been working on and has had no real testing as yet.

stevenhoneyman commented 10 years ago

@SeeSpotRun Whoa, that was quick! Thanks :)

OK, I've been running a few tests - sort order is working perfectly... but which version should I be using for testing, i.e. where would you like bugs reporting for the devel/develop branch?

There's 3 that look similar to the "untrained eye" now :laughing: : sahib/rmlint/tree/develop SeeSpotRun/rmlint/tree/devel SeeSpotRun/rmlint/tree/master

Thanks, Steven

SeeSpotRun commented 10 years ago

sahib/rmlint/tree/develop. Sahib is still the sahib! Daniel. Edit: I will working in SeeSpotRun/rmlint/tree/devel. But others may also contribute, so sahib/rmlint/tree/develop is probably the best one to clone into. Chris has had a lot more experience at this than I.

stevenhoneyman commented 10 years ago

Thanks, will do. Anyway - feature added, so closing this :+1: