xiph / flac

Free Lossless Audio Codec
https://xiph.org/flac/
GNU Free Documentation License v1.3
1.58k stars 278 forks source link

flac -t and potential false negative #624

Closed jaredmo closed 1 year ago

jaredmo commented 1 year ago

I am trying to gain a better understanding of flac -t and came across some odd behavior. When I run the test on a specific file it passes. However, when I recalculate the md5 using ffmpeg and compare to metaflac --show-md5sum the values do not match. See below.

NOTE: The files with this behavior are downloaded from Bandcamp. The checksums match on my CDs ripped with EAC. Not sure if that is relevant

Expected Behavior flac -t presents an error if md5 on the audio stream doesn't match internal checksum.

Actual Behavior FLAC test passes. However, the internal checksum from metaflac and the recalculated checksum from ffmpeg do not match.

[[name]@[machine] HOTLINE]$ flac -t "80s Stallone - HOTLINE - 01 Driven.flac" 

flac 1.4.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

80s Stallone - HOTLINE - 01 Driven.flac: ok 
[[name]@[machine] HOTLINE]$ metaflac --show-md5sum "80s Stallone - HOTLINE - 01 Driven.flac" 
13cee21900b0ca923e7661a5b7e050e3
[[name]@[machine] HOTLINE]$ ffmpeg -i "80s Stallone - HOTLINE - 01 Driven.flac" -map 0:0 -f md5 - 2>/dev/null | sed s/.*=//g
d64d472628202180e635881ef6032b86
ktmf01 commented 1 year ago

Could it be this file is not 16-bit PCM? By default ffmpeg produces an MD5 sum calculated over pcm_s16le unless requested otherwise. If this file contains 24-bit PCM for example, it will truncate all samples to 16-bit and calculate the MD5 of that.

So, perhaps you could provide the output of ffprobe or metaflac --list?

jaredmo commented 1 year ago

That's the issue! I tested on a 16-bit file from Bandcamp, and the md5 hash matches.

The original question is resolved, however, do you know a method to force ffmpeg to calculate the hash on the original PCM rather than converting to 16-bit?

ktmf01 commented 1 year ago

Yes, sure. From the top of my head, you need to add -c:a pcm_s24le. So that would make

ffmpeg -i file.flac -map 0:0 -c:a pcm_s24le -f md5 - 2>/dev/null | sed s/.*=//g

For other bit depths you'll need to change it accordingly.

Edit: as far as I know, there is no way to have ffmpeg automatically select the right bit depth, you'll need to specify that manually really.

jaredmo commented 1 year ago

Perfect. I forgot that ffmpeg converts by default if -c isn't specified.

One more and then I will close the ticket. Is there documentation on what flac -t is testing?

I am assuming it's the md5 and maybe more?

ktmf01 commented 1 year ago

As stated by https://xiph.org/flac/documentation_tools_flac.html

Test a flac encoded file (same as -d except no decoded file is written)

This means some of the metadata blocks and all audio frames of the FLAC file are parsed and an MD5 sum is calculated, as is the case with decoding.

However, recently I've coded one addition: flac -t will also check for an ID3v2 tag and warn the user that this is non-standard. This is not yet in any released version. I plan to add more specific tests to flac -t, like parsing of all metadata blocks.