Closed ericmurmur closed 4 years ago
Hi! Thank you for your contribution. Can you please quote a text, where it's written? I can't find it. The only information I found is here:
<Header for 'Text information frame', ID: "T000" - "TZZZ",
excluding "TXXX" described in 4.2.6.>
Text encoding $xx
Information <text string(s) according to encoding>
And there is nothing written about termination bytes.
And also this (http://id3.org/id3v2.4.0-frames):
All text information frames supports multiple strings,
stored as a null separated list, where null is
reperesented by the termination code for the character encoding.
There are some argues from google search about this. However,
4. ID3v2 frame overview
... ...
Frames that allow different types of text encoding contains a text
encoding description byte. Possible encodings:
$00 ISO-8859-1 [ISO-8859-1]. Terminated with $00.
$01 UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
strings in the same frame SHALL have the same byteorder.
Terminated with $00 00.
$02 UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
Terminated with $00 00.
$03 UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
Strings dependent on encoding are represented in frame descriptions
as <text string according to encoding>, or <full text string
according to encoding> if newlines are allowed. Any empty strings of
type $01 which are NULL-terminated may have the Unicode BOM followed
by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).
From 4.2 in http://id3.org/id3v2.4.0-frames
All text information frames supports multiple strings,
stored as a null separated list, where null is
represented by **the termination code** for the character encoding.
"it states text string in frame with encoding byte prefixed and terminated with its corresponding termination bytes"
It doesn't state it actually. It just shows the termination bytes for corresponding encodings, but nothing says that every encoded text should be terminated.
"It also show an empty string example"
It only shows an example for Unicode-16 with BOM and only which are NULL-terminated:
Any empty strings of type $01 which are NULL-terminated
may have the Unicode BOM followed by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).
2.
From 4.2 in http://id3.org/id3v2.4.0-frames,
it says multiple strings are stored as null separated list where null
is represented by the termination code.
It does not clearly state the status of the final termination bytes.
Yes, there is nothing said about using one string in text frame, so if you use only one string in text frame with no termination, you don't violate the standard.
4 - I would like to add the writing the termination bytes for text frames, just for compatibility with Windows File Explorer. But please open the new issue for it with detailed description.
3, 5 - We should not rely on how other programs work, but on standards.
It's really strange situation about the termination bytes of text frames, but just take a look on other frames like USLT or APIC at http://id3.org/id3v2.4.0-frames. You can see the explicit indication about the termination bytes at the end:
... Description is a short description of the picture, represented as a terminated text string ...
<Header for 'Attached picture', ID: "APIC">
Text encoding $xx
MIME type <text string> $00
Picture type $xx
Description <text string according to encoding> $00 (00) <----
Picture data <binary data>
... The 'Content descriptor' is a terminated string ...
<Header for 'Unsynchronised lyrics/text transcription', ID: "USLT">
Text encoding $xx
Language $xx xx xx
Content descriptor <text string according to encoding> $00 (00) <----
Lyrics/text <full text string according to encoding>
In docs about text frames I don't see it:
<Header for 'Text information frame', ID: "T000" - "TZZZ",
excluding "TXXX" described in 4.2.6.>
Text encoding $xx
Information <text string(s) according to encoding> <----
So I don't think WFE works according to standards.
It's really not clear in spec. Hence compatibility will be good.
In addition, in sec 4, frame overview from http://id3.org/id3v2.4.0-structure, it lists
Terminated with $00
or
Terminated with $00 00
for possible encodings of text.
Doesn't it mean to prefix encoding byte and terminate the text with termination bytes?
Frames that allow different types of text encoding contains a text encoding description byte. Possible encodings:
$00 ISO-8859-1 [ISO-8859-1]. Terminated with $00.
$01 UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
strings in the same frame SHALL have the same byteorder.
Terminated with $00 00.
$02 UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
Terminated with $00 00.
$03 UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
Doesn't it mean to prefix encoding byte and terminate the text with termination bytes?
I think, it doesn't mean so. It just means that if it should be terminated, it terminated with corresponding termination bytes.
@ericmurmur Please open a new issue with detailed description that WFE doesn't correctly read text frames, which are set with id3v2 library.
It just means that if it should be terminated, it terminated with corresponding termination bytes.
Yes. So it means termination bytes shall be there for text string, doesn't it? And since Golang does not include termination bytes for string like c/c++, it shall be added, isn't it? In c/c++ library, termination bytes are already in their string buffer.
Yes. So it means termination bytes shall be there for text string, doesn't it?
And since Golang does not include termination bytes for string like c/c++,
it shall be added, isn't it? In c/c++ library, termination bytes are already in their string buffer.
I don't think so. There is no explicit information in standard that I must terminate strings in every text frame.
Is the encoding byte added? Yes, right? In the standard, it lists all 4 encoding bytes. In every case, it has stated [Terminated with $00] or [Terminated with $00 00]. It is NOT optional. So at least, it shall be interpreted as for those started with encoding byte, it shall be terminated with $00 or $00 00. Or where do you suggest to implement the sentence [Terminated with $00] or [Terminated with $00 00]? In addition, according to 4.2 in http://id3.org/id3v2.4.0-frames,
Information <text string(s) according to encoding>
Doesn't it mean to follow the encoding descriptions in http://id3.org/id3v2.4.0-structure, which has stated [[Terminated with $00] or [Terminated with $00 00] in all encoding cases?
Hi @ericmurmur! I'm very sorry for so late answer, I had a lot of to do.
I think, you were right in this discussion. Many sites and apps terminate text frames with $00 (00)
, although it's not clearly stated in spec. If you'll update tests, I would like to merge this PR :)
Just replace following tests: parse_test.go
// https://github.com/bogem/id3v2/issues/13.
// https://github.com/bogem/id3v2/commit/3845103da5b1698289b82a90f5d2559b770bd996
func TestParseV3UnsafeSize(t *testing.T) {
t.Parallel()
buf := new(bytes.Buffer)
title := strings.Repeat("A", 254)
tag := NewEmptyTag()
tag.SetVersion(3)
tag.SetTitle(title)
if _, err := tag.WriteTo(buf); err != nil {
t.Fatalf("Error while writing tag: %v", err)
}
titleFrameHeader := buf.Bytes()[tagHeaderSize : tagHeaderSize+frameHeaderSize]
expected := []byte{0, 0, 1, 0}
got := titleFrameHeader[4:8]
if !bytes.Equal(got, expected) {
t.Fatalf("Expected %v, got %v", expected, got)
}
parsedTag, err := ParseReader(buf, Options{Parse: true})
if err != nil {
t.Fatalf("Error while parsing tag: %v", err)
}
if parsedTag.Title() != title {
t.Fatalf("Titles are not equal: len(parsedTag.Title()) == %v, len(title) == %v", len(parsedTag.Title()), len(title))
}
}
tag_test.go
framesSize = 211948
And also please push new test.mp3
:)
Closing in favour of #52
According to the spec http://id3.org/id3v2.4.0-structure, TerminationBytes has to be in the text frame. Some ID3 reader like Windows file explorer media file properties cannot handle such unicode text frames without proper TerminationBytes.