UTF16 encoded text data might get damaged when parsing / decoding.

tillt commented 4 years ago

It appears text data, when UTF16 LE encoded, won't get properly parsed for some frame types.

Consider the following user defined text frame:

00000000: 54 58 58 58 00 00 00 25 00 00 01 ff fe 44 00 49 TXXX...%.....D.I 00000010: 00 53 00 43 00 49 00 44 00 00 00 ff fe 36 00 33 .S.C.I.D.....6.3 00000020: 00 30 00 41 00 43 00 30 00 30 00 38 00 00 00 .0.A.C.0.0.8...

Encoding key is 0x01 which is UTF16 with BOM. The description of that TXXX is "DISCID" and its trailing 'D' is encoded by [0x44, 0x00].

The problem I noticed with UTF16 LE encoded, user defined text frame contents that for the description, the trailing character may get garbled, the following value then starts at the wrong offset, adding garbage at the beginning.

readTillDelims will seek through that frame's byte stream and will try to detect the expected two trailing zero bytes without adding them to the respective frame-data - in this case the description. The intend is to stop at the delimiter [0x00, 0x00], we do however incorrectly consider that last byte of 'D' rune data which happens to be 0x00 as being the first byte of the delimiter. We end up with truncated UTF16 data.

The user defined text frame parser will then try to read the value starting one byte too early, adding a prefixing 0x00.

The following output is what I get based on id3v2's functionality;

TXXX: DISCI�: ��630AC008 (encoding: 1)

This nasty workaround -- not a fix!

diff --git a/user_defined_text_frame.go b/user_defined_text_frame.go
index f6cd917..ed722ba 100644
--- a/user_defined_text_frame.go
+++ b/user_defined_text_frame.go
@@ -30,6 +30,7 @@ func (udtf UserDefinedTextFrame) WriteTo(w io.Writer) (n int64, err error) {
 func parseUserDefinedTextFrame(br *bufReader) (Framer, error) {
        encoding := getEncoding(br.ReadByte())
        description := br.ReadText(encoding)
+       description = append(description, br.ReadByte())

        if br.Err() != nil {
                return nil, br.Err()

Then produces the correct output:

TXXX: DISCID: 630AC008 (encoding: 1)

A proper fix needs to be located in readTillDelims, I guess.

n10v commented 4 years ago

Possibly fixed in https://github.com/bogem/id3v2/commit/88f2646e81c1ad6fc4de6006754314e5faaf5634. Can you please test it? 🙏

tillt commented 4 years ago

Thanks a bunch @bogem - I finally got around testing it against some of my previously failed MP3s and all seems fine now - tested v1.1.3.

n10v / id3v2

UTF16 encoded text data might get damaged when parsing / decoding. #51