ligzy / mp4v2

Automatically exported from code.google.com/p/mp4v2
Other
0 stars 0 forks source link

UNICODE support in tags #108

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Under Windows DOS command line (cmd.exe), there is no direct way to pass 
unicode characters to be written in tags.

Tried entering raw UTF8 bytes, calling mp4tags from UTF8 batch file, though 
characters seem to be parsed and mapped according to local codepage as ANSI, 
then ANSI finally encoded to UTF8 in MP4 file.

Under Windows Start>Run, everything works fine.

I know this is not a problem of mp4tags. However, under Windows XP (guess also 
Vista/7), cmd.exe will break all chances to pass unicode through. The only 
workaround would be setting a local codepage for cmd.exe, which should contain 
all featured UTF8 characters, thus conversion would not destroy UTF8. However, 
if extended characters from multiple languages are used (e.g., 
Spanish+Russian), no codepage will include the full character set.

This Windows issues break any chance to write a simple script-driven process to 
automatically extract audio from CDs (with EAC) and then have it AAC encoded 
and tagged in a neat MP4.

What version of the product are you using? On what operating system?

mp4tags latests svn (rev 474?)

Please provide any additional information below.

A nice and elegant solution would be a new option to write one or all tags 
according to the contents of some text file encoded in UTF8 with or without 
BOM. File could include one tag per line, starting by tag name, then separator 
(tab or space), then the actual data, which should not be parsed for String 
tags.

The open source Tag, for MP3/APE/FLAC does include such an option.

Thanks.

Original issue reported on code.google.com by havj...@gmail.com on 21 Jun 2011 at 1:17

GoogleCodeExporter commented 9 years ago
Actually, pretty sure it is a problem of mp4tags; mp4v2 (the underlying 
library) is unicode compatible and accepts UTF8 strings for most stuff.  The 
command line apps, however are not.  See this 
http://code.google.com/p/mp4v2/issues/detail?id=103 for more detail, but 
basically the command line apps need to be modified to support unicode 
arguments.

Original comment by jnor...@logitech.com on 21 Jun 2011 at 3:42

GoogleCodeExporter commented 9 years ago
I have encountered the same problem with mp4tags, suspect that cmd.exe is at 
fault too, and would much appreciate a solution

Original comment by CarlEd...@gmail.com on 21 Jun 2011 at 8:21

GoogleCodeExporter commented 9 years ago
Without having any idea about how the program works, I'd dare say it's a 
problem of codepage and encoding conversion. Other tagges, as AtomicParsley 
suffer from similar issue. However, atomicparsley will work with unicode tags 
from Start>Run dialog, while mp4tags won't (contrary to my first statement). 
Strange enough, when entering some DOS extended character in cmd command line, 
the ANSI character (according to current DOS codepage) is converted to Windows 
ANSI (according to Windows codepage). What is the exact expected behaviour? Is 
there any support for unicode/utf8 in cmd command line at all? If answer is no, 
then you may either detect and preserve UTF8 multibyte characters, or else 
convert all other single byte ANSI characters to UTF8 equivalents. I'll take a 
look at the source. Anything I should know as a non-c++ programmer?

Original comment by havj...@gmail.com on 21 Jun 2011 at 10:15

GoogleCodeExporter commented 9 years ago
Again, the issue is the command line tools in mp4v2, which are not setup to 
handle unicode arguments on Windows.  It is not cmd.exe.  See issue #103 - but 
in a nutshell,

You can divide mp4v2 into two basic parts:

1. The core library itself
2. A set of command line tools that use the core library.

About a year ago, neither handled unicode on Windows correctly.  The core 
library simply expected ASCII strings and puked everywhere if you passed in a 
string with extended characters, as did the command line tools.  Myself and 
another developer modified the core library to handle UTF8, but we did not 
update the command line tools as well because that was a lot of extra work.  
Primarily because encoding of command line arguments in a cross platform manner 
is a complete mess:

http://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv

...there have been a few other reported issues where this has cropped up (see 
issue 98 and issue 103), so at some point we need a good plan for how to fix 
the command line tools so they properly accept unicode strings.  That said, 
it's not clear to me what the best course of action is.

Original comment by kid...@gmail.com on 24 Jun 2011 at 3:33

GoogleCodeExporter commented 9 years ago
One, relatively simple way I would imagine, would be for the command line tools 
to accept option arguments via the "@file" syntax used by some programs (see, 
e.g., http://docs.python.org/dev/library/argparse.html#fromfile-prefix-chars).  
When the tool encounters an "@file" argument, it would read the file and treat 
each line in it as one command line argument.  That avoids the whole shell 
argument munging issue.  I could adopt my tool chain using the mp4tools to use 
that mechanism with only a little work and so could others, I assume.  The only 
issue could be a newline quoting mechanism which would allow a multi-line 
argument (as possible, for example, with mp4tags longdescs).

Original comment by CarlEd...@gmail.com on 27 Jul 2011 at 12:10