Feature Description
...
FLAC contains an audio MD5 built into the file tags as part of the spec. I've found ~1% of my FLACs have a fake hash: 00000000000000000000000000000000. The other 99% have valid md5s that can be used to dedupe the music.
Implementation checklist:
❌ Determine UI/UX for interacting with duplicate MD5s. In my view, duplicate MD5s should be treated as duplicate files entirely. The file with more tag entries, or a longer total text length of tag contents, should win the tie. Files with embedded art etc. should win the tie.
❌ Detect and ignore fake hashes
❌ Read the MD5 from flac files
Feature Description ... FLAC contains an audio MD5 built into the file tags as part of the spec. I've found ~1% of my FLACs have a fake hash: 00000000000000000000000000000000. The other 99% have valid md5s that can be used to dedupe the music.
Metaflac can access the md5: https://xiph.org/flac/documentation_tools_metaflac.html
Implementation checklist: ❌ Determine UI/UX for interacting with duplicate MD5s. In my view, duplicate MD5s should be treated as duplicate files entirely. The file with more tag entries, or a longer total text length of tag contents, should win the tie. Files with embedded art etc. should win the tie. ❌ Detect and ignore fake hashes ❌ Read the MD5 from flac files
see e.g. https://discourse.beets.io/t/basic-script-to-dedupe-identical-flacs/2001