Special Characters in ID3-Tag

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Rip a CD with special characters (like äöüßéèâ...) in the description
recieved from freedb.
2. Special characters appear correctly in file-names and directory-names
but not in the ID3-tags when ripping with lame.

What is the expected output? What do you see instead?
Special characters also appear correctly in the ID3-tag. 

What version of rubyripper are you using? On what operating system? Are you
using the gtk2 or the commandline interface?

Rubyripper 0.3 with gtk2
Debian sid (Sidux)
LAME 32bits version 3.97

Please provide any additional information below.
May correspond to Issue #59, but RRIP is not crashing.

Original issue reported on code.google.com by marc.hue...@gmx.de on 11 Apr 2007 at 8:18

GoogleCodeExporter commented 9 years ago

There are some problems with special characters indeed. Ruby as a language 
hasn't 
got real support for unicode. I haven't got a clue why the filenames are done 
correctly and the id3 tagging not. I'll leave the bug open to see if anyone 
comes 
up with an idea how to solve this.

Original comment by rubyripp...@gmail.com on 11 Apr 2007 at 9:08

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Filenames would depend on the system locale I believe... maybe filenames are 
handed
off to a system call.  It looks like for mp3 lame is doing the tag insertion.  
Does
this happen if the tracks are encoded as ogg or flac as well?

Original comment by mordbr...@gmail.com on 25 Apr 2007 at 8:50

GoogleCodeExporter commented 9 years ago

Looks to be a lame limitation.  Yet another reason to use a free codec instead 
;)

Original comment by mordbr...@gmail.com on 26 Apr 2007 at 10:54

GoogleCodeExporter commented 9 years ago

I can confirm this and have the same problems. I do not think that it is lame's 
fault
as the tags are written correctly using Grip with the same CD and the same 
liblame.

Original comment by heinz.weigand@gmail.com on 28 Jul 2007 at 12:19

GoogleCodeExporter commented 9 years ago

I can confirm this issue too. When ripping a disc with special characters to 
both
lame and flac at the same time, the characters appear fine in the flac tags but 
not
in the lame tags. So this appears to be a lame-specific issue. I have not tested
other codecs, though.

Original comment by jesper...@gmail.com on 3 Oct 2007 at 12:01

GoogleCodeExporter commented 9 years ago

Flac and Faac show no problems whatsoever. These codecs do not get special 
treatment of any kind. They are handled exactly the same way. So question is 
then: 
whose bug is it?

Original comment by rubyripp...@gmail.com on 4 Oct 2007 at 8:12

GoogleCodeExporter commented 9 years ago

Have digged a bit deeper on this one. The reason why Grip may work well is 
probably 
that they use internal binding with lame. The lame and oggenc application are 
broken with respect to UTF-8 handling, the nowaday standard. Even when passing 
the 
info by hand lame and oggenc fail to see the UTF-8 encoding.

As you may notice, the filenames are correctly done. While this happens in the 
same 
command! Alas, this is a bug that the lame and oggenc developers should solve.

Original comment by rubyripp...@gmail.com on 4 Oct 2007 at 8:36

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

You are right, when called from the command-line, lame produces the same (wrong)
behavior. The problem is that lame writes the tags in UTF-8 but does not set the
encoding of the tag to UTF-8. This is clearly a lame-bug, but I think it would 
be
nice if rubyripper could workaround this bug because it is very annoying in some
cases (e.g., if you want to rip CDs of German artists).

I see two possibilities for a workaround:

1) Convert the strings to latin1 before passing them to lame

2) Fix the ID3v2 tag afterwards with the perl script from the Amarok FAQ at
http://amarok.kde.org/wiki/FAQ#Amarok_is_not_displaying_my_utf-8_id3v2_tags_prop
erly.21

To implement 1), probably iconv can be used, e.g.

require 'iconv'
k = Iconv.new("ISO-8859-1", "UTF-8")
latin1string = k.iconv(unicodestring)
latin1string << k.iconv(nil)
k.close

I think, this should be very easy to implemnt (I can try to implement this, if 
you
are interested). But there should be a config option to enable/disable this 
code,
because probably it won't be necessary on all systems (e.g., systems that do 
not use
UTF-8, or maybe later lame versions that have this bug fixed).

To do solution 2) automatically, it would be necessary to have the feature that 
a
user-defined command can be executed for each created MP3 file right after the 
coding
is finished.

Original comment by peter...@yahoo.de on 5 Oct 2007 at 4:32

GoogleCodeExporter commented 9 years ago

Thanks for these suggestions. Solution nr. 1 seems to be a nice suggestion. 
I'll 
try your code and see if it solves anything. In the meantime I'll reopen the 
issue.

Original comment by rubyripp...@gmail.com on 5 Oct 2007 at 4:52

Changed state: Started

GoogleCodeExporter commented 9 years ago

Thanks for reopening the issue. I have attached a quick&dirty test patch 
against file
rr_lib.rb (latest SVN version). Note, that this patch is only for testing 
purposes.
It converts the strings for track name, album name, artist, and genre to 
ISO-8859-1
before passing it to lame. For me, with this patch, the special chars in the 
ID3v2
tags are now correct (currently tested only with one example: the song "Für 
immer und
ewig" on the album "Das Spiel" by the German folk metal band "Letzte Instanz").

The new code needs the package libiconv-ruby.

Please test this code (if possible) and tell me if it works for you. If it 
does, I
try to tune the code a bit (don't know what happens when this code is executed 
in a
non-unicode environment) and then a config option that allows to enable/disable 
this
code should be added (have to look further how this can work, probably need 
help for
this).

Original comment by peter...@yahoo.de on 5 Oct 2007 at 6:29

Attachments:

patch0

GoogleCodeExporter commented 9 years ago

Thanks for the patch, but I chose another implementation. 

With revision 150 in svn lame and oggenc shouldn't have any problem with 
unicode 
characters. Your environment should be set to UTF-8 though. But this is 
nowadays 
standard.

Without your input this would never get fixed. Thanks a lot :D

Original comment by rubyripp...@gmail.com on 6 Oct 2007 at 2:18

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

In the latest SVN all files are encoded (using MP3 lame codec) to a file named 
"%o"
in the working directory (from where I started rr_gui). I am not sure if this 
caused
by the fix of this bug, but I think so because the fix of this bug seems to be 
the
only difference between revision 149 and 150.

Another point: I am not sure if I am right here, but I doubt that it is a good 
idea
to convert the whole command line including the filenames to latin1. On a UTF-8
environment, filenames should be in UTF-8 and not latin1.

Original comment by peter...@yahoo.de on 6 Oct 2007 at 7:57

GoogleCodeExporter commented 9 years ago

Addition: When I replace the %o by #{filename} in the command line in function 
"def
mp3(filename, genre)", then I get exactly the behavior I expected: The file is 
now
created correctly again (with correct name at the correct location), but the 
special
characters in the file name are now wrong because they are encoded latin1 
instead of
utf8.

Original comment by peter...@yahoo.de on 6 Oct 2007 at 8:15

GoogleCodeExporter commented 9 years ago

Ai, ai, ai. That is what happens when you're about to committing something. And 
the 
next thing that happens is that the ring bells. Visitors. You give your code a 
quick last look and commit. And there it goes. You forget something obvious. 
Should 
be corrected in commit 151.

Original comment by rubyripp...@gmail.com on 6 Oct 2007 at 8:15

GoogleCodeExporter commented 9 years ago

Thank you, the code in revision 151 looks much more reasonable and fixes the 
problem
completely for me. Thanks!

Original comment by peter...@yahoo.de on 6 Oct 2007 at 8:23

GoogleCodeExporter commented 9 years ago

I found out, that the current code fails if the directory where the WAV file is
written to contains special characters. In that case, the MP3 encoder does not 
find
the input file. The problem is, that the path of the input file is converted to
latin1. As with the name of output file, this must be prevented.

The problem can be reproduces easily with the filename schema "%f/%a/%b (%y)/%n 
- %t"
and a CD where the CD artist contains special characters (e.g. ä,ü,ü,ß,ý).

The attached patch for the file rr_lib.rb fixes this problem.

Original comment by peter...@yahoo.de on 15 Oct 2007 at 1:20

Attachments:

rr_lib-utf8inputfile.diff

GoogleCodeExporter commented 9 years ago

The inputfile now stays in UTF-8. Thanks for the patch.

Original comment by rubyripp...@gmail.com on 15 Oct 2007 at 4:39

GoogleCodeExporter commented 9 years ago

This is not fixed:

lame --ta "Die Ärzte" -V4 --id3v2-only 01-Them\ Bones.wav
results in:
TPE1=Die Ärzte

When using rubyripper i get:
TPE1=Die

Original comment by rasmus.s...@gmail.com on 9 Sep 2012 at 5:39

GoogleCodeExporter commented 9 years ago

> This is not fixed
I can confirm this (see comment #18).
I have the same problem.

Original comment by eniak.i...@gmail.com on 9 Sep 2012 at 5:41

GoogleCodeExporter commented 9 years ago

I guess you no longer use Rubyripper 0.3. Please report a new issue.

Original comment by boukewou...@gmail.com on 9 Sep 2012 at 5:46

GoogleCodeExporter commented 9 years ago

> I guess you no longer use Rubyripper 0.3. Please report a new issue.
see Issue 534

Original comment by eniak.i...@gmail.com on 9 Sep 2012 at 7:37

toomasadam / aloxripper

Special Characters in ID3-Tag #76