Provide functionality to find and display duplicates

martinatime / SuperDeDuper_archived

A program for cataloging, updating, and reducing duplicates of MP3 files

1 stars 0 forks source link

Provide functionality to find and display duplicates #14

Open martinatime opened 11 years ago

martinatime commented 11 years ago

Using the information in the catalog there should be functionality to compare records and figure out if two files are possible duplicates. The functionality should:

display some sort of rating or percentage or confidence level for the duplicates
display each record to the user to review

evnjones commented 11 years ago

Let's make the matching algorithm configurable. By default, we'll use some default set of fields to identify duplicates (hash, MusicBrainz/other service id, etc.), and a user can change what fields are used to identify duplicate suspects.

martinatime commented 11 years ago

I agree. Configurability of which fields to compare and also the weight/importance. For instance, the hash would have the most importance and the artist and song title would have more weight than release date or comments. I would say that over the years I have copied and merged my collection a couple of time so I have a lot of files that are going to be 100% identical except for the filename is "song.mp3" and "song 1.mp3".

I was thinking that maybe in the future the algorithm's resulting percentage could be used a determination on whether to automatically de-dupe some files.

evnjones commented 11 years ago

Ah, weighting, I didn't even think of that. That will provide a good way to have fine-grained control on how the matching is done.

The automatic de-duping scenario I think is where the backup of files (by default) makes sense (piggybacking on the conversation in #12). Metadata changes wouldn't need to default to full backup, as the catalog would retain the historical changes.