pixelb / fslint

Linux file system lint checker/cleaner
319 stars 72 forks source link

[Enhancement/Suggestion] Modification of the duplicate file selection: keep the first file of a group, but also keep those files in the group with distinct names #104

Closed Wikinaut closed 9 years ago

Wikinaut commented 9 years ago

FSlint can mark all files but the first file in each group of files with the same hash for deletion.

I propose a new option, so that files with the same hash will not be selected if they have different filenames.

pixelb commented 9 years ago

So a new option in the "within groups" section. "Select with matching names"

Wikinaut commented 9 years ago

@pixelb I meant:

Purpose:

I have two (slightly different) copies of a windows hard disk, one made with a tool, one by copying to an USB drive. So I have a path /work/copy1 and /work/copy2. Files in copy1 take precedence over files in copy2. I want to delete all copies of files in copy2 which are identical to files in copy1, but I want to preserve files in copy1 when they have different names even if they have the same hash. (Your current version marks those with the same hash, in my example below b.file and c.file).

I want to delete all doublures with the same name and same hash but I want to keep the one which lives in the first path/drive.

/copy1
a.file hash = aaa
b.file hash = bbb
c.file hash = bbb
/copy2
a.file hash = aaa
b.file hash = bbb
c.file hash = bbb
wanted result:
/copy1
a.file (unmarked, because the first file in the group aaa)
b.file (unmarked, because it is the first file in the group bbb - but file with different name exist (c.file))
c.file (unmarked, because in the group bbb - but file with different name exists (b.file))
/copy2:
a.file (marked, because in group aaa, but not the first file. The first file is /copy1/a.file)
b.file (marked, because in group bbb, but not the first file. The first file is /copy1/b.file)
c.file (marked, because in group bbb, but not the first file. The first file is /copy1/b.file)

I want to delete all doublures with the same name and same hash but I want to keep the one which lives in the first path/drive.

pixelb commented 9 years ago

Thanks for the extra info. I think this is another case of issue #24. Note you can "select using wildcard" (or unselect), to give preference to /copy1 or /copy2

Wikinaut commented 9 years ago

Thanks.

The following may be off-topic, but it is relevant for my case above (may be similar to issue #24).

I also noticed, that (now empty) subdirectories in /copy2 were not deleted. But on the other hand, deletion of empty directories is not mentioned in the selection menu. In another run, it looked as if the emtpy directories were deleted, so a different problem may exist (perhaps I have to file a different issue about this)

Wikinaut commented 9 years ago

@pixelb wrote

Thanks for the extra info. I think this is another case of issue #24. Note you can "select using wildcard" (or unselect), to give preference to /copy1 or /copy2

The wildcard info was a good hint, and is helpful. But difficult when you often compare two paths. Perhaps later I will add an option "In the result list, select all files of the first path" and "In the result list, select all files of the second path".

Basically I use FSlint as "find and delete all files in the second path which are identical to files in the first path, and which have equal names. Do not delete identical files if they have different names."

Wikinaut commented 9 years ago

Closing now, as the general problem has been solved, or, a workaround could be found.