n10v / id3v2

🎵 ID3 decoding and encoding library for Go
https://pkg.go.dev/github.com/bogem/id3v2/v2
MIT License
337 stars 52 forks source link

Can't parse v2.3 sizes #13

Closed n10v closed 6 years ago

n10v commented 7 years ago

I'm just noticed there are a lot of tracks that have non-standard size of frames. For example, some audios have e.g. following size of APIC frame: 00000000 00000100 01001101 10101001. Last byte begins with 1, but in specification it says that frame size should always starts with 0. id3v2 doesn't read frames which have an invalid size format. But iTunes does it correctly, so there is some way how these frames should be parsed. I think, it should be a nice feature to read and process non-standard size formats. I tried to do it, but there were always some significant problems so I have no idea how to implement it correctly.

snikch commented 6 years ago

I think I'm running into this issue. Have you had any thoughts on this since you posted here?

n10v commented 6 years ago

@snikch Actually not. But can you please send here the track that reproduces this issue?

n10v commented 6 years ago

There is some progress for this issue. Hopefully I will release it soon

rstemmer commented 6 years ago

Hi, I just came along this issue and got nervous whether I understood the standard wrong or you :)

The 7bit/byte number for the size is only used in the Tag Header: "The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte [..]" -- id3.org/id3v2.3.0

For the size information of a Frame there is nothing mentioned regarding the encoding. But there is implicitly specified that the size is a "regular" (big endian as specified for all multi-byte number in ID3v2) integer ("Size $xx xx xx xx" -- id3.org/id3v2.3.0)

I hope I could help.

n10v commented 6 years ago

Hi @rstemmer! Yes, you actually helped me. I didn't know, that in v2.3.0 there is different frame size format than in v2.4.0. In v2.4.0 it is so (http://id3.org/id3v2.4.0-structure):

4.   ID3v2 frame overview

   All ID3v2 frames consists of one frame header followed by one or more
   fields containing the actual information. The header is always 10
   bytes and laid out as follows:

     Frame ID      $xx xx xx xx  (four characters)
     Size      4 * %0xxxxxxx
     Flags         $xx xx

The fix is coming today / tomorrow.

rstemmer commented 6 years ago

Oh :astonished: I did'n know that the size-format changed in 2.4.0. I'm glad we talked about it - Time to fix my code as well :smile:

tmthrgd commented 6 years ago

@bogem If I'm reading the change correctly, 3845103 isn't a correct fix. v2.3.0 uses 32-bit big endian sizes for frame headers, and v2.4.0 uses syncsafe integers for frame headers; while both use syncsafe integers for tag headers.

From the v.2.3.0 spec:

The ID3v2 tag header, which should be the first information in the file, is 10 bytes as follows:

ID3v2/file identifier "ID3" ID3v2 version $03 00 ID3v2 flags %abc00000 ID3v2 size 4 * %0xxxxxxx

From the v2.4.0 spec:

The first part of the ID3v2 tag is the 10 byte tag header, laid out as follows:

ID3v2/file identifier      "ID3"
ID3v2 version              $04 00
ID3v2 flags                %abcd0000
ID3v2 size             4 * %0xxxxxxx

Note that the tag header size is the same (syncsafe) in both cases. Where the sizes differ between v2.3.0 and v2.4.0 are the extended header and the frame headers. This is the how I parse the tags in my id3v2 parser, and it works when tested against my mp3 collection.


Extended Header

From the v.2.3.0 spec:

The extended header contains information that is not vital to the correct parsing of the tag information, hence the extended header is optional.

Extended header size $xx xx xx xx Extended Flags $xx xx Size of padding $xx xx xx xx

From the v2.4.0 spec:

The extended header contains information that can provide further insight in the structure of the tag, but is not vital to the correct parsing of the tag information; hence the extended header is optional.

Extended header size   4 * %0xxxxxxx
Number of flag bytes       $01
Extended Flags             $xx

Frame Header

From the v.2.3.0 spec:

As the tag consists of a tag header and a tag body with one or more frames, all the frames consists of a frame header followed by one or more fields containing the actual information. The layout of the frame header:

Frame ID $xx xx xx xx (four characters) Size $xx xx xx xx Flags $xx xx

From the v2.4.0 spec:

All ID3v2 frames consists of one frame header followed by one or more fields containing the actual information. The header is always 10 bytes and laid out as follows:

Frame ID      $xx xx xx xx  (four characters)
Size      4 * %0xxxxxxx
Flags         $xx xx
n10v commented 6 years ago

@tmthrgd thank you. I will fix it now

n10v commented 6 years ago

Fixed in https://github.com/bogem/id3v2/commit/5785be9ad4520277483e10d919bc41d2c040bc8e