n10v / id3v2

🎵 ID3 decoding and encoding library for Go
https://pkg.go.dev/github.com/bogem/id3v2/v2
MIT License
340 stars 52 forks source link

Convert UTF-16 from e.g. comments into UTF-8? #53

Closed highend closed 4 years ago

highend commented 4 years ago

Hi,

I'd like to print the comment field to the console (under Windows in this case). This doesn't really work well^^ if that field was stored with UTF-16 BOM encoding before, e.g. by MP3Tag. Is there any way to tell the library that I get a (converted) UTF-8 string instead? The console supports UTF-8 and with the right font this would result in some readable text :)

                comments := tag.GetFrames(tag.CommonID("Comments"))
                for _, v := range comments {
                    comment, ok := v.(id3v2.CommentFrame)
                    if !ok {
                        log.Fatal("Couldn't assert comment frame")
                    }
                    fmt.Printf("\nComment: %s", comment.Text)
                    fmt.Printf("\nComment: %s", comment.Encoding)
                }

ATM it looks like this:

Comment: �C o m m e n t
Comment: UTF-16 encoded Unicode with BOM

The original comment is really "Comment" for testing purposes...

n10v commented 4 years ago

Hi! id3v2 should automatically convert UTF-16 text to UTF-8 and it should correctly print the comment. So it's a bug, but the latest release should fix some UTF-16 issues. Do you use the latest version? Try to go get -u github.com/bogem/id3v2

highend commented 4 years ago

Hi,

yeah, I'm already on the latest version:

git log
commit 3da57733453b434afc02a084236bbe86209c6974 (HEAD -> master, tag: v1.1.2, origin/master, origin/HEAD)
Author: Albert Nigmatzianov <albertnigma@gmail.com>
Date:   Tue Mar 10 08:47:07 2020 +0100

    Support older versions and 32bit
n10v commented 4 years ago

Hm okay, that's weird. Can you please send the file, so I can reproduce the bug?

highend commented 4 years ago

Here is the link for the zip file that contains the mp3: https://filebin.net/eemwrdr40a27e876

The ID3v2.3 + ID3v1 tags have been created with MP3Tag. Settings:

[x] ID3v1
[x] ID3v2
    (o) ID3v2.3 UTF-16

Regarding my environment: Go 1.14.1 Compiled for Windows, x64

errisnotnil commented 4 years ago

I have the same problem.

Go 1.14.1 macOS 10.15.3

image

n10v commented 4 years ago

(Information for me)

Reason of problem: description of comment is empty string encoded in UTF-16 with BOM. id3v2 doesn't consider this situation.

Screenshot 2020-03-25 at 20 25 19

Part of code that causing bug:

func (br *bufReader) ReadText(encoding Encoding) []byte {
        ...
    if encoding.Equals(EncodingUTF16) {
        text = append(text, br.ReadByte())
    }
        ...

Solution: Don't read next byte if text equals BOM

n10v commented 4 years ago

Should be fixed in https://github.com/bogem/id3v2/commit/051e4b2ea87f4646bab6d86642dce1448346a402. Please check 🙏

errisnotnil commented 4 years ago

This works for me. Thanks!

Please add a new release for use with go mod.

highend commented 4 years ago

Works for me as well, tested all combinations of MP3Tag options, the output is encoded fine (for comments). Thanks!

n10v commented 4 years ago

@qkveri Done!

You're welcome guys 🙂