Closed GoogleCodeExporter closed 9 years ago
I see the problem slightly differently, but I guess I'm biased. From what I
can tell, the library works fine, or at least is conceptually fine and may have
a bug here or there. The command line tools on the other hand, don't work
fine. See http://code.google.com/p/mp4v2/issues/detail?id=98#c12 for where I
left this. If no one comments on whether my plan is a reasonable one, it'll
take longer for me to dive in. Marking this as a duplicate.
PS Option 2 doesn't allow for long (> ~260 character) filename support.
Option 1 can work fine if we ignore the argv passed to main and instead get the
command line args from GetCommandLineW.
Original comment by dbyr...@gmail.com
on 30 May 2011 at 7:13
I am aware about that #2 is limited to 260 characters and I am sure that users
can live with that "limitation" as it applies to 99% of all Windows
applications anyway. There is nothing to argue against about having 32K
support, but when this support comes for only ASCII-7 strings, it's maybe not
worth the effort. Anyway, I agree it's a matter of "point of view".
Please do not consider using "GetCommandLineW", it will lead to lot of work
with quoted arguments because you will have to parse them by yourself then. If
you want to go this path, then use at least "wmain(wchar_t *argv[])" – this
is an alternate "main" function which is to be used by Unicode programs and
which will give you all command line arguments as Unicode strings. It will also
solve you all the hassle with parsing quoted, or even double quoted command
line arguments.
Cheers.
Original comment by Skaa...@gmail.com
on 30 May 2011 at 7:27
I think CommandLineToArgvW resolves the quoting issues when dealing with
GetCommandLineW. I think it provides the same thing as wmain. If not, I'll
certainly look at wmain.
Original comment by dbyr...@gmail.com
on 30 May 2011 at 7:47
Just to clarify, at some point mp4v2 had _no_ utf support for any input
arguments, in both the core library and command line tools. Myself and another
developer added UTF8 filename support to the core library, but we did not
update the command line tools to support UTF encoding. So I agree with dybyron
on this one; the library works fine (or is at least conceptually fine), but the
command line tools don't work well.
I think this is a perfectly good issue to track, but let's get a more specific
bug: something along the lines of "Command Line tools do not support Unicode"
so we're all on the same page.
Original comment by kid...@gmail.com
on 30 May 2011 at 8:52
Yes, I agree. If the Unicode (UTF-8) support was specified this way, then the
concept of the library is fine, and it's my fault that I did not read up all
the documentation and other issues and the replies about this subject.
I just wonder whether the functions in "Utf8ToFilename" and "Utf8DecodeChar"
won't cause troubles. I can not proof that there is a flaw, nor will I try to
do so. But from a general point of view, it's questionable to not use the
Windows API functions for converting from/to Unicode. Even if the Windows API
functions may have flaws or may be lacking support for some code points under
certain version of Windows, they are though the functions which will get used
by application code to create the UTF-8 encoded file names. So, to achieve the
same "correctness" (or "flaws") for encoding and decoding and therefore the
available support in the file system of the OS, I recommend to use the same
function group for encoding and for decoding. This would mean, that if assuming
that the application code will use "WideCharToMultiByte(CP_UTF8)" for encoding
a filename, that the library is supposed to use "MultiByteToWideChar(CP_UTF8)".
Another possibility would be that you expose a new "FilenameToUTF8" utility
function which is then to be used by application code. This new
"FilenameToUTF8" would be required to use exactly the same conversion
functionality (code points, surrogates, etc.) as the "Utf8ToFilename". Either
way, you could assume that no information gets lost/changed during the
conversion.
If the Windows API gives a certain Unicode file name to the application, that
Unicode file name may or may not contain 'flaws' which may be specific to
certain version of Windows. But, if the application does not change the content
of that Unicode string, the application can safely assume that it can pass that
Unicode file name back to the OS and that it will get access to that file. Now,
if that Unicode string gets encoded/decoded with third party functions, there
is a chance that something gets changed in the representation of such an
Unicode string which may lead to an 'invalid' string for certain Windows
versions when it's eventually passed again to some Unicode file API.
Anyway, it might be more a theoretical issue. The existing code within the
library will very most likely work for almost all Unicode file names, given
that the library is used as it was thought from application code other than the
provided command line tools.
Original comment by Skaa...@gmail.com
on 31 May 2011 at 11:32
Just to be clear, wmain() is a windows thing; that won't work on *nix or OSX:
http://stackoverflow.com/questions/2438297/can-we-use-wmain-functions-with-unix-
compilers-or-itll-work-only-on-windows
...so even if we went this route, we'd still have to #define out different
stuff on Windows and nix, and stuff would be sort of messy in the command line
tools. See here for more info:
http://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv
Original comment by kid...@gmail.com
on 24 Jun 2011 at 3:30
Original issue reported on code.google.com by
Skaa...@gmail.com
on 30 May 2011 at 7:05