Closed dolkow closed 1 year ago
Thanks for the detailed report. Yes, indeed. Not considering the extended header size seems to be the issue. Looks like ID3 tags with extended header are extremely rare, otherwise we would have seen more reports.
I can provide a patch later today
Looks like ID3 tags with extended header are extremely rare, otherwise we would have seen more reports.
Or it's maybe just that ID3 tags at the end of the file are rare? Audacity writes it at the beginning when exporting mp3:
tmp$ hexdump -C silence.mp3
00000000 49 44 33 04 00 40 00 00 01 68 00 00 00 0c 01 20 |ID3..@...h..... |
00000010 05 06 03 1a 03 7f 54 52 43 4b 00 00 00 02 00 00 |......TRCK......|
00000020 00 31 54 43 4f 4e 00 00 00 11 00 00 00 52 65 6c |.1TCON.......Rel|
00000030 61 78 61 74 69 6f 6e 2e 2e 3f 20 3a 29 54 59 45 |axation..? :)TYE|
00000040 52 00 00 00 05 00 00 00 32 30 32 33 54 44 52 43 |R.......2023TDRC|
00000050 00 00 00 05 00 00 00 32 30 32 33 43 4f 4d 4d 00 |.......2023COMM.|
00000060 00 00 17 00 00 00 00 00 00 00 54 68 69 73 20 69 |..........This i|
00000070 73 20 61 20 63 6f 6d 6d 65 6e 74 21 43 4f 4d 4d |s a comment!COMM|
00000080 00 00 00 17 00 00 00 58 58 58 00 54 68 69 73 20 |.......XXX.This |
00000090 69 73 20 61 20 63 6f 6d 6d 65 6e 74 21 54 50 45 |is a comment!TPE|
000000a0 31 00 00 00 0d 00 00 00 53 6e 69 6c 64 20 44 6f |1.......Snild Do|
000000b0 6c 6b 6f 77 54 41 4c 42 00 00 00 14 00 00 00 4d |lkowTALB.......M|
000000c0 75 74 61 67 65 6e 20 42 75 67 20 52 65 70 6f 72 |utagen Bug Repor|
000000d0 74 73 54 49 54 32 00 00 00 16 00 00 00 4f 6e 65 |tsTIT2.......One|
000000e0 20 53 65 63 6f 6e 64 20 6f 66 20 53 69 6c 65 6e | Second of Silen|
000000f0 63 65 ff fb 90 c4 00 00 00 00 00 00 00 00 00 00 |ce..............|
00000100 00 00 00 00 00 00 00 58 69 6e 67 00 00 00 0f 00 |.......Xing.....|
00000110 00 00 28 00 00 11 e1 00 06 06 0c 0c 13 13 13 19 |..(.............|
00000120 19 20 20 20 26 26 2c 2c 2c 33 33 39 39 39 40 40 |. &&,,,33999@@|
00000130 46 46 46 4c 4c 53 53 53 59 59 60 60 60 66 66 6c |FFFLLSSSYY```ffl|
In that case, the overly-long read will not be noticed.
I wonder what'd happen if you tried to save the modified headers, though. Maybe writes aren't based on that same size variable so it's fine?
As I understand it, adding the ID3 tag to the start of the file is not possible in WAVs. It's probably also uncommon to add ID3 tags to WAVs, which is why I seem to be the first to have stumbled upon this. :)
I can provide a patch later today
To be clear, I'm not in that much of a hurry; whatever time/day that's convenient for you is more than fine.
Or it's maybe just that ID3 tags at the end of the file are rare? Audacity writes it at the beginning when exporting mp3:
Yes, looks like this is what happens. Also ID3 tags in WAVE are non-standard and only supported by a few tools (e.g. MP3Tag, foobar2000 and a few more). The tags don't necessarily are at the end, but the file often ends up like that.
But still I think those extended headers are rare. Which tool did you use to tag this WAVE file with ID3?
I wonder what'd happen if you tried to save the modified headers, though. Maybe writes aren't based on that same size variable so it's fine?
It "works" in the sense that it generates a valid file with proper ID3 tag block, as the size gets recalculated. But actually mutagen does not support extended header and does not write it. So when saving it gets lost. In this particular example the extended header contained the CRC checksum, which the newly written tag will not have.
Extending mutagen to support the extended ID3 header would be a separate story. At least it could be considered preserving existing headers. But even then it must be considered how each flag is handled (and maybe not all flags are supported). E.g. the CRC needs to be recalculated of course. Not sure how to deal with the tag size restriction flags then, probably drop them.
To be clear, I'm not in that much of a hurry; whatever time/day that's convenient for you is more than fine.
Ha, no. All good. It was just that I was investigating this and I had tests and fix already ready, but then had no time to finish. I just wanted to comment so nobody else wasted time doing the same.
Which tool did you use to tag this WAVE file with ID3?
Audacity 3.3.3 -- just the "Export as WAV" option in the menu, which pops up a metadata dialog after choosing the output location. To be very specific, this is what the Build Information tab in About says:
The Build Commit Id: Official openSUSE BuildSTRING:3.3.3|STRING]] of 2023-07-12T00:00:00Z Build type: CMake Release build (debug level 1), 64 bits Compiler: GCC 13.2.1 Installation Prefix: /usr Cache folder: /home/snild/.cache/audacity Settings folder: /home/snild/.config/audacity Data folder: /home/snild/.local/share/audacity State folder: /home/snild/.local/state/audacity Core Libraries wxWidgets (Cross-platform GUI library) 3.2.2 PortAudio (Audio playback and recording) v19 libsoxr (Sample rate conversion) Enabled File Format Support libmpg123 (MP3 Importing) Enabled libvorbis (Ogg Vorbis Import and Export) Enabled libid3tag (ID3 tag support) Enabled libflac (FLAC import and export) Enabled libtwolame (MP2 export) Enabled QuickTime (Import via QuickTime) Disabled ffmpeg (FFmpeg Import/Export) Enabled gstreamer (Import via GStreamer) Disabled Features Nyquist (Plug-in support) Enabled LADSPA (Plug-in support) Enabled Vamp (Plug-in support) Enabled Audio Units (Plug-in support) Disabled VST (Plug-in support) Enabled LV2 (Plug-in support) Enabled PortMixer (Sound card mixer support) Enabled SoundTouch (Pitch and Tempo Change support) Enabled SBSMS (Extreme Pitch and Tempo Change support) Enabled
So... libid3tag maybe?
mutagen does not support extended header and does not write it
That's fine by me. I just want to be able to read the "normal" tags out of WAV files.
even then it must be considered how each flag is handled (and maybe not all flags are supported)
https://mutagen-specs.readthedocs.io/en/latest/id3/id3v2.4.0-structure.html says "All unknown flags MUST be unset and their corresponding data removed when a tag is modified", which seems like a reasonable strategy (and the only one that could work for e.g. checksums).
Mutagen fails to open the attached silence.wav.gz (but gunzip'd, of course; github wouldn't allow the plain .wav to be uploaded). With just a tiny log print added, we see we're 12 bytes short:
My distro's 1.46.0 release has the same behavior.
The file and tags were generated by export from Audacity.
ffprobe
handles it (or at least recognizes that there's an id3 tag -- but ignores it for other metadata?):Based on my (very newly-acquired) understanding of id3v2.4, the header says:
Then comes the extended header:
I suspect that the
extsize_data = read_full(fileobj, 4)
andself._extdata = read_full(fileobj, extsize)
lines inID3Header.__init__()
are the culprits -- they have already eaten those 12 bytes (and they have not been subtracted from the totalsize
value of the header).