qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
18.32k stars 603 forks source link

Use FLAC audio MD5 for music duplicates #1285

Open RollingStar opened 1 month ago

RollingStar commented 1 month ago

Feature Description ... FLAC contains an audio MD5 built into the file tags as part of the spec. I've found ~1% of my FLACs have a fake hash: 00000000000000000000000000000000. The other 99% have valid md5s that can be used to dedupe the music.

Metaflac can access the md5: https://xiph.org/flac/documentation_tools_metaflac.html

Implementation checklist: ❌ Determine UI/UX for interacting with duplicate MD5s. In my view, duplicate MD5s should be treated as duplicate files entirely. The file with more tag entries, or a longer total text length of tag contents, should win the tie. Files with embedded art etc. should win the tie. ❌ Detect and ignore fake hashes ❌ Read the MD5 from flac files

see e.g. https://discourse.beets.io/t/basic-script-to-dedupe-identical-flacs/2001