nicfit / eyeD3

eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are ID3v1 (1.0/1.1) and ID3v2 (2.3/2.4).
http://eyed3.nicfit.net/
GNU General Public License v3.0
545 stars 58 forks source link

Character Encoding Probems #201

Closed reprise5 closed 2 years ago

reprise5 commented 6 years ago

Description / Reproduce Bug

Setup Information

Log/Stacktrace

From the moment the bug was caught

SERVICE_AREA-.mp3   [ 2.22 MB ]
-------------------------------------------------------------------------------
Time: 06:30 MPEG1, Layer III    [ 48 kb/s @ 44100 Hz - Stereo ]
-------------------------------------------------------------------------------
Setting artist: 友 & 愛
Setting title: Service Area
Writing tag...
Uncaught exception: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)
Traceback (most recent call last):
  File "/usr/bin/eyeD3", line 1265, in <module>
    retval = main();
  File "/usr/bin/eyeD3", line 1242, in main
    retval = app.handleFile(f);
  File "/usr/bin/eyeD3", line 559, in handleFile
    if not self.tag.update():
  File "/usr/lib/python2.7/dist-packages/eyeD3/tag.py", line 526, in update
    self.__saveV2Tag(version);
  File "/usr/lib/python2.7/dist-packages/eyeD3/tag.py", line 1251, in __saveV2Tag
    raw_frame = f.render();
  File "/usr/lib/python2.7/dist-packages/eyeD3/frames.py", line 756, in render
    self.text.encode(id3EncodingToString(self.encoding));
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)

Traced Execution Log:

Command: $ eyeD3 --debug -t "title" -a "友 & 愛" ~/Desktop/testfile.mp3

testfile.mp3    [ 2.22 MB ]
-------------------------------------------------------------------------------
eyeD3 trace> Linking File: /home/reprise/Desktop/testfile.mp3
eyeD3 trace> Located ID3 v2 tag
eyeD3 trace> TagHeader [major]: 2
eyeD3 trace> TagHeader [minor]: 4
eyeD3 trace> TagHeader [revis]: 0
eyeD3 trace> TagHeader [flags]: unsync(0) extended(0) experimental(0) footer(0)
eyeD3 trace> TagHeader [size string]: 0x00000118
eyeD3 trace> TagHeader [size]: 152 (0x98)
eyeD3 trace> sizeLeft: 152
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #1
eyeD3 trace> FrameHeader [start byte]: 10 (0xA)
eyeD3 trace> FrameHeader [id]: TXXX (0x54585858)
eyeD3 trace> FrameHeader [data size]: 18 (0x12)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 18 (0x12) bytes of data from byte pos 20 (0x14)
eyeD3 trace> FrameSet: 18 bytes of data read
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: major_brand
eyeD3 trace> UserTextFrame text: isom
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: major_brand
eyeD3 trace> UserTextFrame text: isom
eyeD3 trace> sizeLeft: 124
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #2
eyeD3 trace> FrameHeader [start byte]: 38 (0x26)
eyeD3 trace> FrameHeader [id]: TXXX (0x54585858)
eyeD3 trace> FrameHeader [data size]: 19 (0x13)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 19 (0x13) bytes of data from byte pos 48 (0x30)
eyeD3 trace> FrameSet: 19 bytes of data read
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: minor_version
eyeD3 trace> UserTextFrame text: 512
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: minor_version
eyeD3 trace> UserTextFrame text: 512
eyeD3 trace> sizeLeft: 95
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #3
eyeD3 trace> FrameHeader [start byte]: 67 (0x43)
eyeD3 trace> FrameHeader [id]: TXXX (0x54585858)
eyeD3 trace> FrameHeader [data size]: 32 (0x20)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 32 (0x20) bytes of data from byte pos 77 (0x4D)
eyeD3 trace> FrameSet: 32 bytes of data read
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: compatible_brands
eyeD3 trace> UserTextFrame text: isomiso2mp41
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: compatible_brands
eyeD3 trace> UserTextFrame text: isomiso2mp41
eyeD3 trace> sizeLeft: 53
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #4
eyeD3 trace> FrameHeader [start byte]: 109 (0x6D)
eyeD3 trace> FrameHeader [id]: TDEN (0x5444454e)
eyeD3 trace> FrameHeader [data size]: 21 (0x15)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 21 (0x15) bytes of data from byte pos 119 (0x77)
eyeD3 trace> FrameSet: 21 bytes of data read
eyeD3 trace> TextFrame encoding: utf_8
eyeD3 trace> TextFrame text: 2018-03-13 06:42:01
eyeD3 trace> TextFrame encoding: utf_8
eyeD3 trace> TextFrame text: 2018-03-13 06:42:01
eyeD3 trace> sizeLeft: 22
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #5
eyeD3 trace> FrameHeader [start byte]: 140 (0x8C)
eyeD3 trace> FrameHeader [id]: TSSE (0x54535345)
eyeD3 trace> FrameHeader [data size]: 12 (0xC)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 12 (0xC) bytes of data from byte pos 150 (0x96)
eyeD3 trace> FrameSet: 12 bytes of data read
eyeD3 trace> TextFrame encoding: utf_8
eyeD3 trace> TextFrame text: Lavf56.1.0
eyeD3 trace> Tag contains 0 bytes of padding.
eyeD3 trace> mp3 header search starting @ a2
eyeD3 trace> MPEG audio version: 1.0
eyeD3 trace> MPEG audio layer: III
eyeD3 trace> MPEG sampling frequency: 44100
eyeD3 trace> MPEG bit rate: 48
eyeD3 trace> MPEG channel mode: Stereo
eyeD3 trace> MPEG channel mode extension: 0
eyeD3 trace> MPEG CRC error protection: False
eyeD3 trace> MPEG original: 0
eyeD3 trace> MPEG copyright: 0
eyeD3 trace> MPEG private bit: 0
eyeD3 trace> MPEG padding: 0
eyeD3 trace> MPEG emphasis: None
eyeD3 trace> MPEG frame length: 156
eyeD3 trace> mp3 header fffb3000 found at position: 0xa2
eyeD3 trace> Info header detected @ 24
eyeD3 trace> Info header flags: 0x7
eyeD3 trace> Info numFrames: 5580
eyeD3 trace> Info numBytes: 2332368
eyeD3 trace> Info TOC (100 bytes): PRESENT
Time: 06:30 MPEG1, Layer III    [ 48 kb/s @ 44100 Hz - Stereo ]
-------------------------------------------------------------------------------
Setting artist: 友 & 愛
Setting title: title
Writing tag...
eyeD3 trace> Rendering tag version: v2.4
eyeD3 trace> Linking File: /home/reprise/Desktop/testfile.mp3
eyeD3 trace> Located ID3 v2 tag
eyeD3 trace> TagHeader [major]: 2
eyeD3 trace> TagHeader [minor]: 4
eyeD3 trace> TagHeader [revis]: 0
eyeD3 trace> TagHeader [flags]: unsync(0) extended(0) experimental(0) footer(0)
eyeD3 trace> TagHeader [size string]: 0x00000118
eyeD3 trace> TagHeader [size]: 152 (0x98)
eyeD3 trace> sizeLeft: 152
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #1
eyeD3 trace> FrameHeader [start byte]: 10 (0xA)
eyeD3 trace> FrameHeader [id]: TXXX (0x54585858)
eyeD3 trace> FrameHeader [data size]: 18 (0x12)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 18 (0x12) bytes of data from byte pos 20 (0x14)
eyeD3 trace> FrameSet: 18 bytes of data read
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: major_brand
eyeD3 trace> UserTextFrame text: isom
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: major_brand
eyeD3 trace> UserTextFrame text: isom
eyeD3 trace> sizeLeft: 124
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #2
eyeD3 trace> FrameHeader [start byte]: 38 (0x26)
eyeD3 trace> FrameHeader [id]: TXXX (0x54585858)
eyeD3 trace> FrameHeader [data size]: 19 (0x13)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 19 (0x13) bytes of data from byte pos 48 (0x30)
eyeD3 trace> FrameSet: 19 bytes of data read
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: minor_version
eyeD3 trace> UserTextFrame text: 512
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: minor_version
eyeD3 trace> UserTextFrame text: 512
eyeD3 trace> sizeLeft: 95
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #3
eyeD3 trace> FrameHeader [start byte]: 67 (0x43)
eyeD3 trace> FrameHeader [id]: TXXX (0x54585858)
eyeD3 trace> FrameHeader [data size]: 32 (0x20)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 32 (0x20) bytes of data from byte pos 77 (0x4D)
eyeD3 trace> FrameSet: 32 bytes of data read
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: compatible_brands
eyeD3 trace> UserTextFrame text: isomiso2mp41
eyeD3 trace> UserTextFrame encoding: utf_8
eyeD3 trace> UserTextFrame description: compatible_brands
eyeD3 trace> UserTextFrame text: isomiso2mp41
eyeD3 trace> sizeLeft: 53
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #4
eyeD3 trace> FrameHeader [start byte]: 109 (0x6D)
eyeD3 trace> FrameHeader [id]: TDEN (0x5444454e)
eyeD3 trace> FrameHeader [data size]: 21 (0x15)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 21 (0x15) bytes of data from byte pos 119 (0x77)
eyeD3 trace> FrameSet: 21 bytes of data read
eyeD3 trace> TextFrame encoding: utf_8
eyeD3 trace> TextFrame text: 2018-03-13 06:42:01
eyeD3 trace> TextFrame encoding: utf_8
eyeD3 trace> TextFrame text: 2018-03-13 06:42:01
eyeD3 trace> sizeLeft: 22
eyeD3 trace> +++++++++++++++++++++++++++++++++++++++++++++++++
eyeD3 trace> FrameSet: Reading Frame #5
eyeD3 trace> FrameHeader [start byte]: 140 (0x8C)
eyeD3 trace> FrameHeader [id]: TSSE (0x54535345)
eyeD3 trace> FrameHeader [data size]: 12 (0xC)
eyeD3 trace> FrameHeader [flags]: ta(0) fa(0) ro(0) co(0) en(0) gr(0) un(0) dl(0)
eyeD3 trace> FrameSet: Reading 12 (0xC) bytes of data from byte pos 150 (0x96)
eyeD3 trace> FrameSet: 12 bytes of data read
eyeD3 trace> TextFrame encoding: utf_8
eyeD3 trace> TextFrame text: Lavf56.1.0
eyeD3 trace> Tag contains 0 bytes of padding.
eyeD3 trace> Found current v2.x tag:
eyeD3 trace> Current tag size: 152
eyeD3 trace> Current tag padding: 0
eyeD3 trace> Rendering frame: TXXX
eyeD3 trace> Rendered 27 bytes
eyeD3 trace> Rendering frame: TXXX
eyeD3 trace> Rendered 28 bytes
eyeD3 trace> Rendering frame: TXXX
eyeD3 trace> Rendered 41 bytes
eyeD3 trace> Rendering frame: TDEN
eyeD3 trace> Rendered 11 bytes
eyeD3 trace> Rendering frame: TSSE
eyeD3 trace> Rendered 21 bytes
eyeD3 trace> Rendering frame: TPE1
Uncaught exception: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)
Traceback (most recent call last):
  File "/usr/bin/eyeD3", line 1265, in <module>
    retval = main();
  File "/usr/bin/eyeD3", line 1242, in main
    retval = app.handleFile(f);
  File "/usr/bin/eyeD3", line 559, in handleFile
    if not self.tag.update():
  File "/usr/lib/python2.7/dist-packages/eyeD3/tag.py", line 526, in update
    self.__saveV2Tag(version);
  File "/usr/lib/python2.7/dist-packages/eyeD3/tag.py", line 1251, in __saveV2Tag
    raw_frame = f.render();
  File "/usr/lib/python2.7/dist-packages/eyeD3/frames.py", line 756, in render
    self.text.encode(id3EncodingToString(self.encoding));
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)

Other notes

This could be an umbrella issue for other open issues about Characters like #195 with accented e's. I believe the offending characters in My particular issue here are the characters 友 and/or 愛.

nicfit commented 6 years ago

Please upgrade eyeD3 to the lastest version (eyeD3 0.6 is 6+ years out of date) and attempt to reproduce.

reprise5 commented 6 years ago

Version

Okay, So this old version 0.6.18 is a problem with the version of eyeD3 in the apt package database (at the time I installed it), and it appears it's still a bit behind, But it's what I expect of Debian Stable.

Sorting... Done
Full Text Search... Done
eyed3/stable,now 0.7.10-1 all [installed]
  Display and manipulate id3-tags on the command-line

python-eyed3/stable,now 0.7.10-1 all [installed,automatic]
  Python module for id3-tags manipulation

I got version 0.8.5 through python-pip, and can confirm the problem still persists. According to the release history notes, this should be the latest. Here's the latest log:

Log/Stacktrace

$eyeD3 -t "title" -a "友 & 愛" Multiverse.mp3

/home/reprise/Desktop/Multiverse.mp3                                                       [ 5.50 MB ]
----------------------------------------------------------------------------------------------------
Setting artist: 友 & 愛
Setting title: title
Time: 11:57 MPEG1, Layer III    [ 64 kb/s @ 44100 Hz - Stereo ]
----------------------------------------------------------------------------------------------------
ID3 v2.3:
title: title
artist: 友 & 愛
album: Particle Void
eyed3.id3:WARNING: Invalid numeric genre ID: 2018
track: 1/87     
FRONT_COVER Image: [Size: 49053 bytes] [Type: image/jpeg]
Description: a1380472194_16.jpg

Writing ID3 version v2.3
Uncaught exception: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)

eyed3:ERROR: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)
Traceback (most recent call last):
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/main.py", line 277, in _main
    retval = mainFunc(args, config)
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/main.py", line 50, in main
    fs_encoding=args.fs_encoding)
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/utils/__init__.py", line 95, in walk
    handler.handleFile(os.path.abspath(path))
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/plugins/classic.py", line 513, in handleFile
    max_padding=max_padding)
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/id3/tag.py", line 825, in save
    self._saveV2Tag(version, encoding, max_padding)
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/id3/tag.py", line 1032, in _saveV2Tag
    max_padding)
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/id3/tag.py", line 948, in _render
    raw_frame = f.render()
  File "/home/reprise/.local/lib/python2.7/site-packages/eyed3/id3/frames.py", line 294, in render
    self.text.encode(id3EncodingToString(self.encoding)))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u53cb' in position 0: ordinal not in range(256)
nicfit commented 6 years ago

Seems environment or a difference in the tag in the file you are using. Here's what I see with your example:

└> eyeD3 -t "title" -a "友 & 愛" ./test.id3 
/home/travis/devel/eyeD3/git/test.id3                                                                                [ 366.00 Bytes ]
--------------------------------------------------------------------------------------------------------------------------------------
Setting artist: 友 & 愛
Setting title: title
ID3 v2.4:
title: title
artist: 友 & 愛
album: Drei Haselnüsse für Aschenbrödel
track:      
Writing ID3 version v2.4
--------------------------------------------------------------------------------------------------------------------------------------
reprise5 commented 6 years ago

Hm That's interesting. Can you tell me more about your environment? If I echo $LANG , it returns en_US.UTF-8. are you saying your terminal is using a different character set which would imply this is to blame for this issue?

nicfit commented 6 years ago

For whatever reason latin_1 is being chosen as the default encoding for that new TPE1 frame, that does not happen in my environment. We seem to have similar locale settings, but here is mine:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
...
LC_ALL=

I just committed some additional logging to help tell a better story, see 53c2fba.

I am able to reproduce the problem tho, but differently then what is going on in your logs. In your logs the problem frame (TPE1) is brand new since does not show up in the frames that are loaded. And for new ID3v2.4 frame the default encoding should be utf8, but it is not happening in your environment.

But when the frame already exists, and has an explicit encoding any new text will be encoded per the current encoding. So, to reproduce.

$ touch issue201.id3
$ eyeD3 -a "abc" --encoding latin1 test.id3 
$ eyeD3  -a "友 & 愛" issue201.id3 
<BOOM>, your error

Have you tried using the --encoding option? That will likely fix you.

$ eyeD3  -a "友 & 愛" --encoding utf8 issue201.id3

I'm interested about why latin1 is using chosen for a new ID3 v2.x frame in your environment. It is not the ideal default, not what I get.

Also, eyeD3 does have a bug here. In the very least it should not crash with a traceback. It's arguable that the frames encoding should be honored rather than "upgraded"

reprise5 commented 6 years ago

Running with Debugging

I followed the instructions from here to run the current version from source (which contains that commit, I checked), But ran into a small problem with some imports, partly due to my ignorance with python. I assume they're supposed to be internal modules rather than external ones? $python main.py -t "title" -a "友 & 愛" ~/Desktop/file.mp3

Traceback (most recent call last):
  File "main.py", line 25, in <module>
    import eyed3
ImportError: No module named eyed3

Locale Information

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
...
LC_ALL = 

--encoding option

Alternatively, the option --encoding utf8 came back with a "no such option" error.

EDIT: I went through the manpage, and I guess the option is --set-encoding. This prevented the exception from occurring. $ eyeD3 --set-encoding=utf8 -t "title" -a "友 & 愛" ~/Desktop/file.mp3

(Sorry to keep you waiting on an update!)