Open tillschaefer opened 2 years ago
Metadata parsing became more strict in FLAC 1.4.0. I would expect these files have corrupt metadata in some way.
Would it be possible to re-encode these files with FLAC 1.3.4 and see what FLAC 1.4.0 then makes of them? It could very well be FLAC 1.3.4 does pass flac -t
just fine but fails at re-encoding too.
Re-encoding with flac 1.3.4 works flawlessly with the following command. The re-encoded files pass the integrity check with version 1.4.x.
mkdir .reencode
for f in *.flac; do
flac "$f" -V -s --best --no-error-on-compression-fail -o ".reencode/$f"
done
Here is the metadata of the re-encoded file
$ metaflac --list .reencode/01\ -\ Tomzen\ -\ Namaste.flac
METADATA block #0
type: 0 (STREAMINFO)
is last: false
length: 34
minimum blocksize: 4096 samples
maximum blocksize: 4096 samples
minimum framesize: 2647 bytes
maximum framesize: 13868 bytes
sample_rate: 44100 Hz
channels: 2
bits-per-sample: 16
total samples: 20815200
MD5 signature: 7bc6b20fdbb16a7c1b9c13c27d810ce7
METADATA block #1
type: 3 (SEEKTABLE)
is last: false
length: 864
seek points: 48
point 0: sample_number=0, stream_offset=0, frame_samples=4096
point 1: sample_number=438272, stream_offset=737269, frame_samples=4096
point 2: sample_number=880640, stream_offset=1790096, frame_samples=4096
point 3: sample_number=1318912, stream_offset=2758598, frame_samples=4096
point 4: sample_number=1761280, stream_offset=3717904, frame_samples=4096
point 5: sample_number=2203648, stream_offset=4724772, frame_samples=4096
point 6: sample_number=2641920, stream_offset=5856461, frame_samples=4096
point 7: sample_number=3084288, stream_offset=6915466, frame_samples=4096
point 8: sample_number=3526656, stream_offset=8061731, frame_samples=4096
point 9: sample_number=3964928, stream_offset=9088914, frame_samples=4096
point 10: sample_number=4407296, stream_offset=10099450, frame_samples=4096
point 11: sample_number=4849664, stream_offset=11166749, frame_samples=4096
point 12: sample_number=5287936, stream_offset=12239218, frame_samples=4096
point 13: sample_number=5730304, stream_offset=13313639, frame_samples=4096
point 14: sample_number=6172672, stream_offset=14398022, frame_samples=4096
point 15: sample_number=6610944, stream_offset=15429506, frame_samples=4096
point 16: sample_number=7053312, stream_offset=16319995, frame_samples=4096
point 17: sample_number=7495680, stream_offset=17373662, frame_samples=4096
point 18: sample_number=7933952, stream_offset=18467602, frame_samples=4096
point 19: sample_number=8376320, stream_offset=19537340, frame_samples=4096
point 20: sample_number=8818688, stream_offset=20676960, frame_samples=4096
point 21: sample_number=9256960, stream_offset=21943037, frame_samples=4096
point 22: sample_number=9699328, stream_offset=23064363, frame_samples=4096
point 23: sample_number=10141696, stream_offset=24285094, frame_samples=4096
point 24: sample_number=10579968, stream_offset=25458404, frame_samples=4096
point 25: sample_number=11022336, stream_offset=26644128, frame_samples=4096
point 26: sample_number=11464704, stream_offset=27818335, frame_samples=4096
point 27: sample_number=11902976, stream_offset=28990588, frame_samples=4096
point 28: sample_number=12345344, stream_offset=30038451, frame_samples=4096
point 29: sample_number=12787712, stream_offset=31099365, frame_samples=4096
point 30: sample_number=13225984, stream_offset=32180620, frame_samples=4096
point 31: sample_number=13668352, stream_offset=33214815, frame_samples=4096
point 32: sample_number=14110720, stream_offset=34255735, frame_samples=4096
point 33: sample_number=14548992, stream_offset=35373065, frame_samples=4096
point 34: sample_number=14991360, stream_offset=36528468, frame_samples=4096
point 35: sample_number=15433728, stream_offset=37625517, frame_samples=4096
point 36: sample_number=15872000, stream_offset=38735277, frame_samples=4096
point 37: sample_number=16314368, stream_offset=39924311, frame_samples=4096
point 38: sample_number=16756736, stream_offset=41077405, frame_samples=4096
point 39: sample_number=17195008, stream_offset=42172639, frame_samples=4096
point 40: sample_number=17637376, stream_offset=43319613, frame_samples=4096
point 41: sample_number=18079744, stream_offset=44522865, frame_samples=4096
point 42: sample_number=18518016, stream_offset=45615810, frame_samples=4096
point 43: sample_number=18960384, stream_offset=46701824, frame_samples=4096
point 44: sample_number=19402752, stream_offset=47683477, frame_samples=4096
point 45: sample_number=19841024, stream_offset=48467121, frame_samples=4096
point 46: sample_number=20283392, stream_offset=49137456, frame_samples=4096
point 47: sample_number=20725760, stream_offset=49723761, frame_samples=4096
METADATA block #2
type: 4 (VORBIS_COMMENT)
is last: false
length: 275
vendor string: reference libFLAC 1.3.4 20220220
comments: 11
comment[0]: TITLE=Namaste
comment[1]: ARTIST=Tomzen
comment[2]: ALBUMARTIST=Various Artists
comment[3]: ALBUM=Floating Point 01
comment[4]: DISCNUMBER=1
comment[5]: TRACKNUMBER=1
comment[6]: GENRE=Ambient
comment[7]: DATE=2003
comment[8]: PUBLISHER=Iboga Records
comment[9]: ORGANIZATION=Iboga Records
comment[10]: LABEL=Iboga Records
METADATA block #3
type: 1 (PADDING)
is last: true
length: 8081
Interestingly, also version 1.4.1 can re-encode these files without any problem and the integrity check passes for the re-encoded files.
I have created a backtrace of the point, when the error is printed:
(gdb) break decode.c:1491
(gdb) run -t /home/till/temp/Various\ Artists\ -\ Floating\ Point\ 01/01\ -\ Tomzen\ -\ Namaste.flac
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/flac -t /home/till/temp/Various\ Artists\ -\ Floating\ Point\ 01/01\ -\ Tomzen\ -\ Namaste.flac
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
flac 1.4.1
Copyright (C) 2000-2009 Josh Coalson, 2011-2022 Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY. This is free software, and you are
welcome to redistribute it under certain conditions. Type `flac' for details.
01 - Tomzen - Namaste.flac: testing, 91% complete
Breakpoint 1, error_callback (decoder=0x5555558555e0, status=FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC, client_data=0x7fffffffcee0) at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:1491
1491 in /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c
(gdb) bt
#0 error_callback (decoder=0x5555558555e0, status=FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC, client_data=0x7fffffffcee0)
at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:1491
#1 0x00007ffff7f6e1a3 in send_error_to_client_ (decoder=0x5555558555e0, status=FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC)
at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/libFLAC/stream_decoder.c:3226
#2 0x00007ffff7f6ab72 in frame_sync_ (decoder=0x5555558555e0)
at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/libFLAC/stream_decoder.c:2020
#3 0x00007ffff7f680ae in FLAC__stream_decoder_process_until_end_of_stream (decoder=0x5555558555e0)
at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/libFLAC/stream_decoder.c:1060
#4 0x000055555555b890 in DecoderSession_process (d=0x7fffffffcee0)
at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:455
#5 0x000055555555adf5 in flac__decode_file (
infilename=0x555555855580 "/home/till/temp/Various Artists - Floating Point 01/01 - Tomzen - Namaste.flac", outfilename=0x0,
analysis_mode=0, aopts=..., options=...) at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/decode.c:183
#6 0x000055555557117e in decode_file (
infilename=0x555555855580 "/home/till/temp/Various Artists - Floating Point 01/01 - Tomzen - Namaste.flac")
at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/main.c:2195
#7 0x000055555556bb03 in do_it () at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/main.c:504
#8 0x000055555556b1dd in main (argc=3, argv=0x7fffffffd748) at /var/tmp/portage/media-libs/flac-1.4.1/work/flac-1.4.1/src/flac/main.c:337
That is unexpected. Would you be able to send only the first 10 kB of the file? That should be enough to analyse this and I don't think such a short part can be considered infringing on copyright.
here they are
Okay, I've found the issue. It wasn't at the start of the file as I suspected, but at the very end.
These files have an ID3v1 tag at the end of the file. Behaviour of FLAC in this regard changed here: https://github.com/xiph/flac/commit/b288acb1099679967ec7c6b72307e8c97927f25a
However, what I do not yet understand is why the encoder does not throw an error. These files are strictly speaking non-compliant FLAC files, so the encoder should throw an error too.
Okay, I've found the issue. It wasn't at the start of the file as I suspected, but at the very end.
These files have an ID3v1 tag at the end of the file. Behaviour of FLAC in this regard changed here: b288acb
Ok, so this is basically not a bug and I should fix my files. DId I understand it right, that the decoder simply stopped at the provided metadata bound before and now continues to the end of file?
However, what I do not yet understand is why the encoder does not throw an error. These files are strictly speaking non-compliant FLAC files, so the encoder should throw an error too.
You mean, the decoding part should fail for re-encode? Otherwise I do not see the encoder problem here. Is the encoder somehow still using the metadata for the length and still converting only part of the file?
In addition to the analysis, I would like to stress, that these files worked perfectly fine until now. So if the re-encode will also error out, there should be a solution to to recover the musical content, e.g., by a switch to mimic the old length calculation for re-encoding. The id3v1 tag with flac files seems not too uncommon. I have around 10 different albums (from about a thousand) with this error. So maybe it would make sense to the check if the metadata is aligned with the end of stream and provide a hint to the user that is more explanatory, i.e., to say that there appears to be additional data after the stream (most probably id3v1).
I am also a bit confused why playback is still working properly with libflac. This album is mixed and there is no gap or glitch when listing the fade from track 1 to 2. Is the decoder simply aborting with an error at the right point?
Okay, I've found the issue. It wasn't at the start of the file as I suspected, but at the very end. These files have an ID3v1 tag at the end of the file. Behaviour of FLAC in this regard changed here: b288acb
Ok, so this is basically not a bug and I should fix my files. DId I understand it right, that the decoder simply stopped at the provided metadata bound before and now continues to the end of file?
Yes. The ID3v1 tag is at the very end of the file, and the decoder used data at the very start of the file to know that it was done. However, there are cases where this data at the start of the file is wrong, and before FLAC 1.4.0, this data would be skipped.
However, what I do not yet understand is why the encoder does not throw an error. These files are strictly speaking non-compliant FLAC files, so the encoder should throw an error too.
You mean, the decoding part should fail for re-encode? Otherwise I do not see the encoder problem here. Is the encoder somehow still using the metadata for the length and still converting only part of the file?
Yes
In addition to the analysis, I would like to stress, that these files worked perfectly fine until now. So if the re-encode will also error out, there should be a solution to to recover the musical content, e.g., by a switch to mimic the old length calculation for re-encoding. The id3v1 tag with flac files seems not too uncommon. I have around 10 different albums (from about a thousand) with this error. So maybe it would make sense to the check if the metadata is aligned with the end of stream and provide a hint to the user that is more explanatory, i.e., to say that there appears to be additional data after the stream (most probably id3v1).
There is a way: the flac
command has the -F
option that can 'ride through' errors. With that option, these files would encode just fine. This change is necessary, because files of which the number of samples data is wrong cannot be re-encoded now without cutting of the last part of the file without throwing an error.
I am also a bit confused why playback is still working properly with libflac. This album is mixed and there is no gap or glitch when listing the fade from track 1 to 2. Is the decoder simply aborting with an error at the right point?
On playback, errors are ignored. Only on decoding, testing or re-encoding (which is a process of which the result is generally permanent, contrary to general playback) errors are not ignored by default.
Thx for the extensive explanation. Should I close this Ticket then or should I keep it open to track the "re-encoding should fail, too" part?
You can keep it open. Thing that need to be done
flac
needs to detect when it has not reached end-of-file despite encoding the target number of samples. Currently it is not possible, not even with -F
, to decode a file of which the total number of samples falls short without editing that number manuallyflac
needs to detect and warn the user about ID3v1 tagsflac
needs to be able to re-encode files with an unknown number of samplesRegarding
These files have an ID3v1 tag at the end of the file.
+
WARNING, ID3v2 tag found. This is non-standard and strongly discouraged
For future reference I utilized id3v2 0.1.12 to address the ID3 tag issue ala.
id3v2 --delete-all 'problematic track.flac'
Afterwards FLAC 1.4.3 --test
reported OK 🤓
ID3v1 and ID3v2 are very different. The first is at the end of the track and has a small chance of misdetection. ID3v2 on the other hand is at the start of the file and has a completely different format. Anyway, detection of ID3v1 was added with #738
I regularly run a flac -t check for my music collection. After upgrading from version 1.3.4 to version 1.4.1 (1.4.0 is also affected) some tests fail with
FLAC__STREAM_DECODER_END_OF_STREAM
that where ok before.Only a minority of my files are affected. Most of them are still returning OK as test result. This happens for the complete Album at once, i.e., if one file on an album is affected every other file on the album is affected as well. Thus, this might has to do with specific encoders or parameters of the encoder.
I am running a Linux build using the gentoo packages of flac. Since the files have a copyright I cannot upload an afffected file here. However, I am willing to execute further tests if needed.