pixelb / fslint

Linux file system lint checker/cleaner
319 stars 72 forks source link

Request to keep oldest modification date when merging, or a choice of oldest,newest,etc. #80

Open pixelb opened 9 years ago

pixelb commented 9 years ago

Original issue 81 created by pixelb on 2012-06-28T01:25:10.000Z:

Hi! Thanks for a very useful program!

Would it be possible to add a feature to select which modification time is kept for a group of files to be merged?

This would be very useful for old photos, videos, etc when there isn't any time/date stored in the file itself.

I've been gathering family photos and videos which have made it onto the hard drive in various ways, which means I get duplicates very often, but which have different modification times because the times were often not preserved when they were transferred.

I think it would make the most sense to default to keeping the oldest date found among the files as the one to keep when merging, since the oldest time is likely the closest to correct for when file was actually created (except maybe for when the date is for some reason set to the beginning of the Epoch in 1970, then maybe could choose to do the opposite).

In version 2.42, I know that we can select "all but newest" or "all but oldest", but I can't find any info on what actually happens after that. And, even if selected files are "not" merged, then it still doesn't solve the problem of the merged files possibly ending up with the wrong date.

What do you think about adding some kind of option to handle this? Would it just need the file arguments in a certain order on the command line to end up with the intended date?

Thank you and have a great day! ^^

pixelb commented 9 years ago

Comment #1 originally posted by pixelb on 2012-06-28T08:45:51.000Z:

Interesting. If you wanted to delete the duplicate photots, then select "all but oldest" would be useful.

But keeping the oldest or newest inode as an option when merging. That could be useful I suppose, but is a bit esoteric and the first request I've had for that.

A related feature request is to support giving priorities to selected folders, so you could order from oldest to newest, though that might not be possible or practical.

pixelb commented 9 years ago

Comment #2 originally posted by pixelb on 2012-06-28T08:46:23.000Z:

<empty>

pixelb commented 9 years ago

Comment #3 originally posted by pixelb on 2012-06-28T09:59:15.000Z:

I honestly keep forgetting fslint has an option to delete duplicates, haha~ :P I think it's because I've found hardlinks so useful for my setup.

Since I'm trying to organize so many photos, videos, etc, the best way (for me at least) I've found so far is to use folders more like tags/labels/categories/whatever by just hardlinking files to them.

This lets me, for example, have a picture of somebody/something hardlinked into folders such as ByPerson/PersonsName/, ByDate/SomeYear/Month/, ByPlace/Somewhere/, ByEvent/SomeEvent/, etc. It doesn't need symlinks, so I don't need to worry about broken symlinks if any filenames change in any folders. Whenever I receive more files to archive, I just use my file manager's modified copy ("cp -an --link") on large selections at a time, and a script which hardlinks files to the ByDate folders based on modification time (since many files don't have EXIF). This is where the feature request would be a great help.

It probably sounds complicated, but I've found it simpler so far than having 3rd-party tagging software, easily-broken symlinks, or renaming thousands of files.

Anyway, thanks for thinking about it! :) It'd be a great help to me. In the meantime, I should be able to use your suggestion in some cases to just delete duplicates and then hardlink/re-hardlink them elsewhere.

Thanks again!

sixtyfive commented 7 years ago

For reasons beyond my control, I can't use either hardlinking nor delete duplicates straight out; instead I need to symlink to the oldest duplicate. So far that doesn't seem to be possible with fslint (it always symlinks to the newest duplicate. One thing I tried (out of an unreflected expectation for it to work) was to click the "Date" column header in the results. I thought that it's perhaps the first duplicate within each group that gets symlinked to and so changing the sorting would give you control over what is considered worth of keeping (in my case, it's about preserving the correct filename). Anyways, perhaps that'd be an idea?

Edit: after playing around with touch -d '1 month ago' on all the files I do not want to keep and touch on all the ones I do want to keep, I'm now confused. It seems fslint doesn't go by file modification date after all? Sometimes it does, now, but in (most) other cases it seems to opt for an alphabetical order instead. What's more, that pattern is not reflected in the GUI: once you confirm that, "Yes, I want to symlink all files", it selects (in gray, not blue) all but the first duplicate within each group. I think that comprises two bugs in one or something...

pixelb commented 7 years ago

Well when you "select all but oldest", the symlinking/hardlinking is just done within that selection. I.E the oldest file will not be linked to the group. Really "select all but ...." is only useful for subsequent deletion

We'd have to as you suggest support ordering by date, and then the link operation to always keep the first entry

sixtyfive commented 7 years ago

Well when you "select all but oldest", the symlinking/hardlinking is just done within that selection. I.E the oldest file will not be linked to the group. Really "select all but ...." is only useful for subsequent deletion

Yes, I tried that to, and found out that it works like you describe.

We'd have to as you suggest support ordering by date, and then the link operation to always keep the first entry

Is that something you'll do at some point?

pixelb commented 7 years ago

Yes but probably not soon. It's a bit involved

endolith commented 4 years ago

That could be useful I suppose, but is a bit esoteric and the first request I've had for that.

I use this feature all the time in AllDup. I am almost always deleting all but oldest:

2020-04-23 21_03_11-AllDup 4 4 22 - Search Result 'File content Byte by Byte (100%)'