xiph / flac

Free Lossless Audio Codec
https://xiph.org/flac/
GNU Free Documentation License v1.3
1.67k stars 281 forks source link

flac -t shows FLAC__STREAM_DECODER_END_OF_STREAM since version 1.4.0 for some files #487

Open tillschaefer opened 2 years ago

tillschaefer commented 2 years ago

I regularly run a flac -t check for my music collection. After upgrading from version 1.3.4 to version 1.4.1 (1.4.0 is also affected) some tests fail with FLAC__STREAM_DECODER_END_OF_STREAM that where ok before.

Only a minority of my files are affected. Most of them are still returning OK as test result. This happens for the complete Album at once, i.e., if one file on an album is affected every other file on the album is affected as well. Thus, this might has to do with specific encoders or parameters of the encoder.

I am running a Linux build using the gentoo packages of flac. Since the files have a copyright I cannot upload an afffected file here. However, I am willing to execute further tests if needed.

$ flac -t 01\ -\ Tomzen\ -\ Namaste.flac 

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

01 - Tomzen - Namaste.flac: *** Got error code 0:FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC

01 - Tomzen - Namaste.flac: ERROR during decoding
                            state = FLAC__STREAM_DECODER_END_OF_STREAM
$ flac -t 01\ -\ Tomzen\ -\ Namaste.flac 

flac 1.3.4
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

01 - Tomzen - Namaste.flac: ok
$ metaflac --list 01\ -\ Tomzen\ -\ Namaste.flac 
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4096 samples
  maximum blocksize: 4096 samples
  minimum framesize: 2636 bytes
  maximum framesize: 13872 bytes
  sample_rate: 44100 Hz
  channels: 2
  bits-per-sample: 16
  total samples: 20815200
  MD5 signature: 7bc6b20fdbb16a7c1b9c13c27d810ce7
METADATA block #1
  type: 3 (SEEKTABLE)
  is last: false
  length: 864
  seek points: 48
    point 0: sample_number=0, stream_offset=0, frame_samples=4096
    point 1: sample_number=438272, stream_offset=751796, frame_samples=4096
    point 2: sample_number=880640, stream_offset=1812843, frame_samples=4096
    point 3: sample_number=1318912, stream_offset=2793114, frame_samples=4096
    point 4: sample_number=1761280, stream_offset=3765922, frame_samples=4096
    point 5: sample_number=2203648, stream_offset=4786522, frame_samples=4096
    point 6: sample_number=2641920, stream_offset=5930373, frame_samples=4096
    point 7: sample_number=3084288, stream_offset=7001573, frame_samples=4096
    point 8: sample_number=3526656, stream_offset=8158135, frame_samples=4096
    point 9: sample_number=3964928, stream_offset=9199557, frame_samples=4096
    point 10: sample_number=4407296, stream_offset=10227280, frame_samples=4096
    point 11: sample_number=4849664, stream_offset=11312059, frame_samples=4096
    point 12: sample_number=5287936, stream_offset=12403741, frame_samples=4096
    point 13: sample_number=5730304, stream_offset=13497013, frame_samples=4096
    point 14: sample_number=6172672, stream_offset=14601250, frame_samples=4096
    point 15: sample_number=6610944, stream_offset=15648734, frame_samples=4096
    point 16: sample_number=7053312, stream_offset=16555210, frame_samples=4096
    point 17: sample_number=7495680, stream_offset=17623860, frame_samples=4096
    point 18: sample_number=7933952, stream_offset=18731708, frame_samples=4096
    point 19: sample_number=8376320, stream_offset=19815916, frame_samples=4096
    point 20: sample_number=8818688, stream_offset=20967909, frame_samples=4096
    point 21: sample_number=9256960, stream_offset=22240334, frame_samples=4096
    point 22: sample_number=9699328, stream_offset=23374822, frame_samples=4096
    point 23: sample_number=10141696, stream_offset=24605882, frame_samples=4096
    point 24: sample_number=10579968, stream_offset=25792228, frame_samples=4096
    point 25: sample_number=11022336, stream_offset=26990143, frame_samples=4096
    point 26: sample_number=11464704, stream_offset=28178133, frame_samples=4096
    point 27: sample_number=11902976, stream_offset=29364966, frame_samples=4096
    point 28: sample_number=12345344, stream_offset=30427390, frame_samples=4096
    point 29: sample_number=12787712, stream_offset=31501062, frame_samples=4096
    point 30: sample_number=13225984, stream_offset=32594946, frame_samples=4096
    point 31: sample_number=13668352, stream_offset=33642831, frame_samples=4096
    point 32: sample_number=14110720, stream_offset=34697026, frame_samples=4096
    point 33: sample_number=14548992, stream_offset=35825976, frame_samples=4096
    point 34: sample_number=14991360, stream_offset=36994595, frame_samples=4096
    point 35: sample_number=15433728, stream_offset=38107034, frame_samples=4096
    point 36: sample_number=15872000, stream_offset=39228709, frame_samples=4096
    point 37: sample_number=16314368, stream_offset=40430342, frame_samples=4096
    point 38: sample_number=16756736, stream_offset=41596540, frame_samples=4096
    point 39: sample_number=17195008, stream_offset=42705300, frame_samples=4096
    point 40: sample_number=17637376, stream_offset=43864950, frame_samples=4096
    point 41: sample_number=18079744, stream_offset=45076327, frame_samples=4096
    point 42: sample_number=18518016, stream_offset=46180114, frame_samples=4096
    point 43: sample_number=18960384, stream_offset=47280556, frame_samples=4096
    point 44: sample_number=19402752, stream_offset=48289111, frame_samples=4096
    point 45: sample_number=19841024, stream_offset=49117444, frame_samples=4096
    point 46: sample_number=20283392, stream_offset=49863974, frame_samples=4096
    point 47: sample_number=20725760, stream_offset=50504294, frame_samples=4096
METADATA block #2
  type: 4 (VORBIS_COMMENT)
  is last: false
  length: 275
  vendor string: reference libFLAC 1.1.3 20061120
  comments: 11
    comment[0]: TITLE=Namaste
    comment[1]: ARTIST=Tomzen
    comment[2]: ALBUMARTIST=Various Artists
    comment[3]: ALBUM=Floating Point 01
    comment[4]: DISCNUMBER=1
    comment[5]: TRACKNUMBER=1
    comment[6]: GENRE=Ambient
    comment[7]: DATE=2003
    comment[8]: PUBLISHER=Iboga Records
    comment[9]: ORGANIZATION=Iboga Records
    comment[10]: LABEL=Iboga Records
METADATA block #3
  type: 1 (PADDING)
  is last: true
  length: 8081
ktmf01 commented 2 years ago

Metadata parsing became more strict in FLAC 1.4.0. I would expect these files have corrupt metadata in some way.

Would it be possible to re-encode these files with FLAC 1.3.4 and see what FLAC 1.4.0 then makes of them? It could very well be FLAC 1.3.4 does pass flac -t just fine but fails at re-encoding too.

tillschaefer commented 2 years ago

Re-encoding with flac 1.3.4 works flawlessly with the following command. The re-encoded files pass the integrity check with version 1.4.x.

mkdir .reencode
for f in *.flac; do
        flac "$f" -V -s --best --no-error-on-compression-fail -o ".reencode/$f"
done

Here is the metadata of the re-encoded file

$ metaflac --list .reencode/01\ -\ Tomzen\ -\ Namaste.flac 
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4096 samples
  maximum blocksize: 4096 samples
  minimum framesize: 2647 bytes
  maximum framesize: 13868 bytes
  sample_rate: 44100 Hz
  channels: 2
  bits-per-sample: 16
  total samples: 20815200
  MD5 signature: 7bc6b20fdbb16a7c1b9c13c27d810ce7
METADATA block #1
  type: 3 (SEEKTABLE)
  is last: false
  length: 864
  seek points: 48
    point 0: sample_number=0, stream_offset=0, frame_samples=4096
    point 1: sample_number=438272, stream_offset=737269, frame_samples=4096
    point 2: sample_number=880640, stream_offset=1790096, frame_samples=4096
    point 3: sample_number=1318912, stream_offset=2758598, frame_samples=4096
    point 4: sample_number=1761280, stream_offset=3717904, frame_samples=4096
    point 5: sample_number=2203648, stream_offset=4724772, frame_samples=4096
    point 6: sample_number=2641920, stream_offset=5856461, frame_samples=4096
    point 7: sample_number=3084288, stream_offset=6915466, frame_samples=4096
    point 8: sample_number=3526656, stream_offset=8061731, frame_samples=4096
    point 9: sample_number=3964928, stream_offset=9088914, frame_samples=4096
    point 10: sample_number=4407296, stream_offset=10099450, frame_samples=4096
    point 11: sample_number=4849664, stream_offset=11166749, frame_samples=4096
    point 12: sample_number=5287936, stream_offset=12239218, frame_samples=4096
    point 13: sample_number=5730304, stream_offset=13313639, frame_samples=4096
    point 14: sample_number=6172672, stream_offset=14398022, frame_samples=4096
    point 15: sample_number=6610944, stream_offset=15429506, frame_samples=4096
    point 16: sample_number=7053312, stream_offset=16319995, frame_samples=4096
    point 17: sample_number=7495680, stream_offset=17373662, frame_samples=4096
    point 18: sample_number=7933952, stream_offset=18467602, frame_samples=4096
    point 19: sample_number=8376320, stream_offset=19537340, frame_samples=4096
    point 20: sample_number=8818688, stream_offset=20676960, frame_samples=4096
    point 21: sample_number=9256960, stream_offset=21943037, frame_samples=4096
    point 22: sample_number=9699328, stream_offset=23064363, frame_samples=4096
    point 23: sample_number=10141696, stream_offset=24285094, frame_samples=4096
    point 24: sample_number=10579968, stream_offset=25458404, frame_samples=4096
    point 25: sample_number=11022336, stream_offset=26644128, frame_samples=4096
    point 26: sample_number=11464704, stream_offset=27818335, frame_samples=4096
    point 27: sample_number=11902976, stream_offset=28990588, frame_samples=4096
    point 28: sample_number=12345344, stream_offset=30038451, frame_samples=4096
    point 29: sample_number=12787712, stream_offset=31099365, frame_samples=4096
    point 30: sample_number=13225984, stream_offset=32180620, frame_samples=4096
    point 31: sample_number=13668352, stream_offset=33214815, frame_samples=4096
    point 32: sample_number=14110720, stream_offset=34255735, frame_samples=4096
    point 33: sample_number=14548992, stream_offset=35373065, frame_samples=4096
    point 34: sample_number=14991360, stream_offset=36528468, frame_samples=4096
    point 35: sample_number=15433728, stream_offset=37625517, frame_samples=4096
    point 36: sample_number=15872000, stream_offset=38735277, frame_samples=4096
    point 37: sample_number=16314368, stream_offset=39924311, frame_samples=4096
    point 38: sample_number=16756736, stream_offset=41077405, frame_samples=4096
    point 39: sample_number=17195008, stream_offset=42172639, frame_samples=4096
    point 40: sample_number=17637376, stream_offset=43319613, frame_samples=4096
    point 41: sample_number=18079744, stream_offset=44522865, frame_samples=4096
    point 42: sample_number=18518016, stream_offset=45615810, frame_samples=4096
    point 43: sample_number=18960384, stream_offset=46701824, frame_samples=4096
    point 44: sample_number=19402752, stream_offset=47683477, frame_samples=4096
    point 45: sample_number=19841024, stream_offset=48467121, frame_samples=4096
    point 46: sample_number=20283392, stream_offset=49137456, frame_samples=4096
    point 47: sample_number=20725760, stream_offset=49723761, frame_samples=4096
METADATA block #2
  type: 4 (VORBIS_COMMENT)
  is last: false
  length: 275
  vendor string: reference libFLAC 1.3.4 20220220
  comments: 11
    comment[0]: TITLE=Namaste
    comment[1]: ARTIST=Tomzen
    comment[2]: ALBUMARTIST=Various Artists
    comment[3]: ALBUM=Floating Point 01
    comment[4]: DISCNUMBER=1
    comment[5]: TRACKNUMBER=1
    comment[6]: GENRE=Ambient
    comment[7]: DATE=2003
    comment[8]: PUBLISHER=Iboga Records
    comment[9]: ORGANIZATION=Iboga Records
    comment[10]: LABEL=Iboga Records
METADATA block #3
  type: 1 (PADDING)
  is last: true
  length: 8081
tillschaefer commented 2 years ago

Interestingly, also version 1.4.1 can re-encode these files without any problem and the integrity check passes for the re-encoded files.

tillschaefer commented 2 years ago

I have created a backtrace of the point, when the error is printed:

(gdb) break decode.c:1491

(gdb) run -t /home/till/temp/Various\ Artists\ -\ Floating\ Point\ 01/01\ -\ Tomzen\ -\ Namaste.flac
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/flac -t /home/till/temp/Various\ Artists\ -\ Floating\ Point\ 01/01\ -\ Tomzen\ -\ Namaste.flac
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

01 - Tomzen - Namaste.flac: testing, 91% complete
Breakpoint 1, error_callback (decoder=0x5555558555e0, status=FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC, client_data=0x7fffffffcee0) at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:1491
1491    in /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c
(gdb) bt
#0  error_callback (decoder=0x5555558555e0, status=FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC, client_data=0x7fffffffcee0)
    at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:1491
#1  0x00007ffff7f6e1a3 in send_error_to_client_ (decoder=0x5555558555e0, status=FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC)
    at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/libFLAC/stream_decoder.c:3226
#2  0x00007ffff7f6ab72 in frame_sync_ (decoder=0x5555558555e0)
    at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/libFLAC/stream_decoder.c:2020
#3  0x00007ffff7f680ae in FLAC__stream_decoder_process_until_end_of_stream (decoder=0x5555558555e0)
    at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/libFLAC/stream_decoder.c:1060
#4  0x000055555555b890 in DecoderSession_process (d=0x7fffffffcee0)
    at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:455
#5  0x000055555555adf5 in flac__decode_file (
    infilename=0x555555855580 "/home/till/temp/Various Artists - Floating Point 01/01 - Tomzen - Namaste.flac", outfilename=0x0, 
    analysis_mode=0, aopts=..., options=...) at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:183
#6  0x000055555557117e in decode_file (
    infilename=0x555555855580 "/home/till/temp/Various Artists - Floating Point 01/01 - Tomzen - Namaste.flac")
    at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/main.c:2195
#7  0x000055555556bb03 in do_it () at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/main.c:504
#8  0x000055555556b1dd in main (argc=3, argv=0x7fffffffd748) at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/main.c:337
ktmf01 commented 2 years ago

That is unexpected. Would you be able to send only the first 10 kB of the file? That should be enough to analyse this and I don't think such a short part can be considered infringing on copyright.

tillschaefer commented 2 years ago

first_10kib.flac.tar.gz

here they are

ktmf01 commented 2 years ago

Okay, I've found the issue. It wasn't at the start of the file as I suspected, but at the very end.

These files have an ID3v1 tag at the end of the file. Behaviour of FLAC in this regard changed here: https://github.com/xiph/flac/commit/b288acb1099679967ec7c6b72307e8c97927f25a

However, what I do not yet understand is why the encoder does not throw an error. These files are strictly speaking non-compliant FLAC files, so the encoder should throw an error too.

tillschaefer commented 2 years ago

Okay, I've found the issue. It wasn't at the start of the file as I suspected, but at the very end.

These files have an ID3v1 tag at the end of the file. Behaviour of FLAC in this regard changed here: b288acb

Ok, so this is basically not a bug and I should fix my files. DId I understand it right, that the decoder simply stopped at the provided metadata bound before and now continues to the end of file?

However, what I do not yet understand is why the encoder does not throw an error. These files are strictly speaking non-compliant FLAC files, so the encoder should throw an error too.

You mean, the decoding part should fail for re-encode? Otherwise I do not see the encoder problem here. Is the encoder somehow still using the metadata for the length and still converting only part of the file?

In addition to the analysis, I would like to stress, that these files worked perfectly fine until now. So if the re-encode will also error out, there should be a solution to to recover the musical content, e.g., by a switch to mimic the old length calculation for re-encoding. The id3v1 tag with flac files seems not too uncommon. I have around 10 different albums (from about a thousand) with this error. So maybe it would make sense to the check if the metadata is aligned with the end of stream and provide a hint to the user that is more explanatory, i.e., to say that there appears to be additional data after the stream (most probably id3v1).

I am also a bit confused why playback is still working properly with libflac. This album is mixed and there is no gap or glitch when listing the fade from track 1 to 2. Is the decoder simply aborting with an error at the right point?

ktmf01 commented 2 years ago

Okay, I've found the issue. It wasn't at the start of the file as I suspected, but at the very end. These files have an ID3v1 tag at the end of the file. Behaviour of FLAC in this regard changed here: b288acb

Ok, so this is basically not a bug and I should fix my files. DId I understand it right, that the decoder simply stopped at the provided metadata bound before and now continues to the end of file?

Yes. The ID3v1 tag is at the very end of the file, and the decoder used data at the very start of the file to know that it was done. However, there are cases where this data at the start of the file is wrong, and before FLAC 1.4.0, this data would be skipped.

However, what I do not yet understand is why the encoder does not throw an error. These files are strictly speaking non-compliant FLAC files, so the encoder should throw an error too.

You mean, the decoding part should fail for re-encode? Otherwise I do not see the encoder problem here. Is the encoder somehow still using the metadata for the length and still converting only part of the file?

Yes

In addition to the analysis, I would like to stress, that these files worked perfectly fine until now. So if the re-encode will also error out, there should be a solution to to recover the musical content, e.g., by a switch to mimic the old length calculation for re-encoding. The id3v1 tag with flac files seems not too uncommon. I have around 10 different albums (from about a thousand) with this error. So maybe it would make sense to the check if the metadata is aligned with the end of stream and provide a hint to the user that is more explanatory, i.e., to say that there appears to be additional data after the stream (most probably id3v1).

There is a way: the flac command has the -F option that can 'ride through' errors. With that option, these files would encode just fine. This change is necessary, because files of which the number of samples data is wrong cannot be re-encoded now without cutting of the last part of the file without throwing an error.

I am also a bit confused why playback is still working properly with libflac. This album is mixed and there is no gap or glitch when listing the fade from track 1 to 2. Is the decoder simply aborting with an error at the right point?

On playback, errors are ignored. Only on decoding, testing or re-encoding (which is a process of which the result is generally permanent, contrary to general playback) errors are not ignored by default.

tillschaefer commented 2 years ago

Thx for the extensive explanation. Should I close this Ticket then or should I keep it open to track the "re-encoding should fail, too" part?

ktmf01 commented 2 years ago

You can keep it open. Thing that need to be done

dlehammer commented 1 month ago

Regarding

These files have an ID3v1 tag at the end of the file.

+

WARNING, ID3v2 tag found. This is non-standard and strongly discouraged

For future reference I utilized id3v2 0.1.12 to address the ID3 tag issue ala.

id3v2 --delete-all 'problematic track.flac'

Afterwards FLAC 1.4.3 --test reported OK 🤓

ktmf01 commented 1 month ago

ID3v1 and ID3v2 are very different. The first is at the end of the track and has a small chance of misdetection. ID3v2 on the other hand is at the start of the file and has a completely different format. Anyway, detection of ID3v1 was added with #738