taglib / taglib

TagLib Audio Meta-Data Library
http://taglib.org/
GNU Lesser General Public License v2.1
1.15k stars 342 forks source link

Cannot parse UTF16 encoding ID3V2 tags properly #1086

Open nkmimi opened 1 year ago

nkmimi commented 1 year ago

Steps to reproduce: Create a user defined ID3V2 tag, and set the encoding to UTF16. For example: frameID: "abcdeabcdef_ID" Text: "{8CB89ECA-D45A-4491-BE3A-50419125D4A4}"

It was parsed to: frameID: "abcdeabcdef_I" Text: "" // Failed to parse

The root cause is in function

template<>
int findVector(
  const ByteVector::ConstIterator dataBegin, const ByteVector::ConstIterator dataEnd,
  const ByteVector::ConstIterator patternBegin, const ByteVector::ConstIterator patternEnd,
  unsigned int offset, int byteAlign)

findVectorFast() is used first to find the pattern, where it does not take byteAlign into account.

ufleisch commented 1 year ago

Could you please give me detailed instructions how to reproduce this? An example file would be great, or just a hex dump of the beginning of a file causing the problem. I have created an empty file and then added a frame as described in your report using an alternative library (id3lib):

$ xxd id3lib.mp3 
00000000: 4944 3303 0000 0000 0079 5458 5858 0000  ID3......yTXXX..
00000010: 006f 0000 01ff fe61 0062 0063 0064 0065  .o.....a.b.c.d.e
00000020: 0061 0062 0063 0064 0065 0066 005f 0049  .a.b.c.d.e.f._.I
00000030: 0044 0000 00ff fe7b 0038 0043 0042 0038  .D.....{.8.C.B.8
00000040: 0039 0045 0043 0041 002d 0044 0034 0035  .9.E.C.A.-.D.4.5
00000050: 0041 002d 0034 0034 0039 0031 002d 0042  .A.-.4.4.9.1.-.B
00000060: 0045 0033 0041 002d 0035 0030 0034 0031  .E.3.A.-.5.0.4.1
00000070: 0039 0031 0032 0035 0044 0034 0041 0034  .9.1.2.5.D.4.A.4
00000080: 007d 00                                  .}.

If I read this file with TagLib, I do not see a problem:

$ tagreader id3lib.mp3
******************** "id3lib.mp3" ********************
TagLib: MPEG::Properties::read() -- Could not find an MPEG frame in the stream.
-- TAG (basic) --
title   - ""
artist  - ""
album   - ""
year    - "0"
comment - ""
track   - "0"
genre   - ""
-- TAG (properties) --
ABCDEABCDEF_ID - "{8CB89ECA-D45A-4491-BE3A-50419125D4A4}"
-- AUDIO --
bitrate     - 0
sample rate - 0
channels    - 0
length      - 0:00
nkmimi commented 1 year ago

parseFields track.zip Please see attached for a problematic track. I created the tag with Kid3.

ufleisch commented 1 year ago

Which version of TagLib are you using on which operating system? Here on Linux I cannot see any problems with the current master branch of TagLib when using the tagreader example with your attached file:

$ tagreader 16\ Aged\ Manchego.mp3 
******************** "16 Aged Manchego.mp3" ********************
-- TAG (basic) --
title   - ""
artist  - ""
album   - ""
year    - "0"
comment - ""
track   - "0"
genre   - ""
-- TAG (properties) --
ABCDEABCDEF_ID - "{8CB89ECA-D45A-4491-BE3A-50419125D4A4}"
-- AUDIO --
bitrate     - 128
sample rate - 44100
channels    - 2
length      - 0:52

I also can read the TXXX frame correctly when using Kid3 with the TagLibMetadata plugin. The only problem I noticed is that the current version of Kid3 is not able to set an encoding other than ISO-8859-1 when using TXXX frames with the TagLibMetadata plugin (it works with the Id3libMetadata plugin though), but this has been fixed in Kid3's master branch.

ufleisch commented 1 year ago

I just noticed that the code in your screen shot is not the state of the master branch. Maybe commit 4e7f844e fixes your problem.

ufleisch commented 1 year ago

Could you please give detailed instructions how to reproduce this, e.g. using an application like tagreader from the TagLib examples? If this is not possible, please provide a failing unit test reproducing the issue.