Closed GoogleCodeExporter closed 8 years ago
On windows, I expect the filename argument to be UTF-8 encoded. If that's not
working, are there any log messages that help indicate what's going on?
Original comment by dbyr...@gmail.com
on 2 May 2011 at 2:32
[deleted comment]
Looks like regression is in r453; let me see if I can get a fix in...
to reproduce, use mp4info with any file containing an umlaut on windows. not
sure what the state of things is on *nix, but windows definitely does not work.
Original comment by kid...@gmail.com
on 2 May 2011 at 8:13
I see in r410 of the branch where the code was changed:
- replace MultiByteToWideChar with custom code to remove
differences across versions of windows and for improved
error detection and illegal character handling
...could you explain what differences someone may see? At the moment, I'm in
favor of reverting to the previous code, which is much more straight-forward
(and works--I tested it).
Original comment by kid...@gmail.com
on 2 May 2011 at 8:18
I'll write up more explanation soon...I'm a little squeezed for the next couple
of days. As I look at this particular issue, I think the problem may be in the
way mp4info processes its command line arguments. I'll look more closely
fairly soon. I'd really like to keep the code that's there, but I'll justify
why in more detail, also soon.
Original comment by dbyr...@gmail.com
on 2 May 2011 at 8:43
From the Remarks section of:
http://msdn.microsoft.com/en-us/library/dd319072%28v=vs.85%29.aspx
--
Starting with Windows Vista, this function fully conforms with the Unicode 4.1
specification for UTF-8 and UTF-16. The function used on earlier operating
systems encodes or decodes lone surrogate halves or mismatched surrogate pairs.
Code written in earlier versions of Windows that rely on this behavior to
encode random non-text binary data might run into problems. However, code that
uses this function on valid UTF-8 strings will behave the same way as on
earlier Windows operating systems.
--
In other words, on pre-Vista error handling of malformaed surrogate pairs isn't
right. And since we didn't set MB_ERR_INVALID_CHARS, we don't learn about
other kinds of invalid byte sequences. Reverting to the code that uses
MultiByteToWideChar, but using MB_ERR_INVALID_CHARS is tempting, but even that
has different behavior on different versions of windows. From the notes about
that flag:
--
Starting with Windows Vista, the function does not drop illegal code points if
the application does not set this flag.
Windows 2000 with SP4 and later, Windows XP: Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
--
So, the point of the code I wrote is to detect and log invalid byte sequences
in a consistent way across all versions of Windows.
Original comment by dbyr...@gmail.com
on 2 May 2011 at 10:16
But since neither the utility code in the project (mp4file, mp4info, ...)
doesn't work with the new code, nor does the code I wrote in VB.Net (which used
to work) there's been some change in the calling convention. I can easily
debug into the MP4VB2 dll and see that the string that I'm passing in is single
byte for all characters (see the attached image).
However, the a umlaut ([13]) shows as -28, since it's a signed character.
Perhaps this is the problem? Seems to me UTF-8 shouldn't interpret the signed
bit. Should these be declared as unsigned char, or is it something else.
I'll be glad to help debug this if someone can give me an idea of what to look
for.
Dan
Original comment by danahins...@gmail.com
on 2 May 2011 at 10:43
Attachments:
I'm not sure what version of File_win32.cpp you're using, but it's different
than mine. r453 calls CreateFileW in StandardFileProvider::open like this:
_handle = CreateFileW( filename, access, share, NULL, crdisp, flags, NULL );
but the screenshot you attached shows:
_handle = CreateFileW( win32::Utf8ToWideChar( name ), access, share, NULL,
crdisp, NULL );
What version of File_win32.cpp have you got?
Original comment by dbyr...@gmail.com
on 3 May 2011 at 2:30
I guess I hadn't sync'ed in a while. So I just sync'ed up and now when I build
I get unresolved externals for mp4v2::impl::log.* (errorf, setVerbosity, ...)
and mp4v2::platform::win32::Utf8ToWideChar. I see the definition of
Utf8ToWideChar in platform_win32_impl.h, but I can't see where log is defined.
Do I need some new define or something to get this to link?
Original comment by danahins...@gmail.com
on 3 May 2011 at 4:49
I don't see any mention of Utf8ToWideChar in platform_win32_impl.h
(http://code.google.com/p/mp4v2/source/browse/trunk/libplatform/platform_win32_i
mpl.h). Are there other files to sync as well?
Original comment by dbyr...@gmail.com
on 3 May 2011 at 1:13
I'm totally confused now. The file I had after sync'ing didn't match the one
at the link. I see there's a subdir p in the link above. What URL should I be
using to sync? I was using http://mp4v2.googlecode.com/svn/trunk, is that not
the correct one? I've attached the platform_win32_impl.h I got when I sync'ed.
Confused,
Dan
Original comment by danahins...@gmail.com
on 3 May 2011 at 4:40
Attachments:
I think the extra p in the URL is for browsing. What does "svn info" say when
you run it in the root of your local repo?
For what it's worth, I tested mp4info with a file with an umlaut in the name
(U+00E4) and it can't read the file. I invoked mp4info with CreateProcess so
I'm confident I passed a properly UTF-8 encoded version of U+00E4 (which is
0xC3A4) as part of the filename. When I looked at argv inside mp4info, I see
something else (0xE4, which in my case is followed by 0x2E -- the . separating
name from extension). So, when E4 gets intepreted as the first byte of a 3
byte UTF-8 encoding, 2E should be a continuation character but isn't since only
[0x80,0xBF] is valid.
I didn't revert to an older version of the tree to see what happens there, but
the solution as I see it is to use (on windows only) the slightly painful
combination of GetCommandLineW, CommandLineToArgvW, and then convert each
argument from UTF-16 to UTF-8. When I did that, the proper UTF-8 makes it
lower down and the file is opened fine.
My plan is to figure out a good way/place to put the code for retrieving the
the UTF-8 encoding of the command line, and then call it from mp4info and the
other tools. I'd love to hear confirmation from someone else that this is a
good idea, and possibly even suggestions for how to structure the code.
Original comment by dbyr...@gmail.com
on 3 May 2011 at 7:09
First let me say that I'm a novice about SVN. I have only used it with MP4V2,
and then only to sync the changes to the source tree. I use RapidSVN and the
result from the info display is below.
FWIW, I don't really care about the utilities, and fixing them isn't the main
problem. I suspect most people use the library, not the utilities. I need to
figure out what the correct way is to call the program from my VB.Net code.
I'll see if converting from UTF-16 to UTF-8 makes a difference.
But I'm really confused about the correct way to get the files from SVN.
Before when I sync'ed, I got all the correct files, so I'm confused as to what
has changed such that my files now don't match what's in the tree. Any hints
on what I should try here would be most appreciated.
The results from RapidSVN info:
Name:
URL: http://mp4v2.googlecode.com/svn/trunk/util
Repository: http://mp4v2.googlecode.com/svn
Repository UUID: 6e6572fa-98a6-11dd-ad9f-f77439c74b79
Revision: 461
Node Kind: directory
Last Changed Author: kidjan
Last Changed Rev: 461
Last Changed Date: 2011-04-26 19:42 GMT
Text Last Updated: 1970-01-01 00:00 GMT
Properties Last Updated: 1970-01-01 00:00 GMT
Checksum: <None>
Original comment by danahins...@gmail.com
on 4 May 2011 at 12:03
If it helps, after finally getting the latest rev to build, I get an set of log
statements when trying to open a file with an a umlaut:
mp4v2::platform::win32::Utf8ToFilename::Utf8DecodeChar: 0x74 is not a valid
continuation character in a UTF-8 encoding
mp4v2::platform::io::StandardFileProvider::open:
CreateFileW(\\?\C:\Downloads\ätest.m4v) failed (2)
mp4v2::impl::MP4File::Open: open(C:\Downloads\ätest.m4v) failed
(..\..\src\mp4file.cpp,395)
Original comment by danahins...@gmail.com
on 4 May 2011 at 10:58
This is basically the same error I get when I pass a UTF-8 encoded string to
mp4info. I'd still like some feedback on my proposal to fix it
(http://code.google.com/p/mp4v2/issues/detail?id=98#c12).
-DB
Original comment by dbyr...@gmail.com
on 5 May 2011 at 2:20
OK, I've figured out how to make my code work, but I'm not sure what it means
with respect to the correct solution. The problem appears to be that it's not
easy (or at least I don't understand how to) turn a unicode string into a utf-8
string. When I look at my string, it has 228 for the a umlaut, which is fine
for unicode. But if I try to encode that string into utf-8 using
encoding.utf8.getbytes and encoding.utf8.getstring, it turns the a umlaut into
a question mark. What I did was use encoding.utf8.getbytes, then built the
char array using asc(byte(i)) then turning that into a string. Using this, it
works!
Maybe this is similar to what you were talking about dbyron? Now that I
understand how to make this work, I'll do some more research about the
"correct" way to accomplish this in .Net.
dbyron, you seem to be a windows guy, I'd like to talk to you about a
performance issue on windows, that I have a fix for. Should I just open up a
new issue (which I did some time back, I think before you got involved) or
would you rather discuss it via email?
Dan
Original comment by danahins...@gmail.com
on 5 May 2011 at 5:16
I don't know any VB or .Net so I'm not sure I can be much help there. I'm not
sure talking about something as a "unicode string" makes sense though. Any
string is in a particular encoding, but as I understand it, there is no
encoding called "unicode." Natively windows uses (at least at the Win32 API
level), UTF-16, so maybe that's what you're starting with?
228 is 0xE4 and a with an umlaut is U+00E4 so it makes sense up to there.
When you say the umlaut turns into a question mark, what do you mean? What are
the actual values of the bytes that come back? It sounds like you've got a
solution, and it may be the correct one.
Sure, send me an email with other questions. I'll see what I can do.
Original comment by dbyr...@gmail.com
on 5 May 2011 at 5:29
I found the more correct solution, I was using type string in calling
MP4Read/Modify, so I always had to turn it back into a string, which turned it
back into UTF-16, which didn't work. The solution was to change the calling
signature to byte() for the filename and use encoding.utf8.getbytes. It now
works like a charm.
I'll send you (dbyron) and email later today or tomorrow to talk about the
other issue, but I think (from my perspective at least) you can close this. Of
course you'll still need to fix the utilities, but it sounds like you've got a
solution for that.
Dan
Original comment by danahins...@gmail.com
on 5 May 2011 at 5:43
dbyron, the link to your userid is broken. And what's your email address?
Dan
Original comment by danahins...@gmail.com
on 5 May 2011 at 5:45
http://code.google.com/u/dbyron0/ works for me, but only when I'm logged in.
I'd rather not publish my email address here to protect from spam. That link
has a captcha to at least make it a little harder to get to.
If that doesn't work, start another issue.
Thanks.
-DB
Original comment by dbyr...@gmail.com
on 5 May 2011 at 5:52
I'm logged in and when I click on that link I get a 404. I can click on mine
or kidjan and they work.
You can just email me at temp@danhinsley.com and I'll get back to you with my
regular email address.
Dan
Original comment by danahins...@gmail.com
on 5 May 2011 at 6:11
I got your email, but when I tried to reply to it, it wasn't valid. Can you
include your email address in the body of the text?
Original comment by danahins...@gmail.com
on 5 May 2011 at 7:07
Ah sorry for the false alert, dbyron. I forgot that the command line utils
don't handle unicode correctly; I tested within our application, that does call
the API directly, and all is good. Thanks for following through though.
Dana, do you have C# interop definitions defined? if so, would you be willing
to donate those to the library? and yes, when passing strings as utf8 in .NET,
you have to take a few extra steps, but it sounds like you figured that out ;)
Original comment by kid...@gmail.com
on 14 May 2011 at 11:46
I use VB.Net, not C#. I've included the API Definitions I use in VB. Due to
the way that arrays of strings and structures with strings are passed, I had to
create a C++ DLL that sits between some of the API (like MP4GetItems, reading
and setting chapters, ...) and my VB App. I could provide that as well if
people wanted to access the library from VB.
Here are the definitions of the API's I call directly from my VB App.
Friend Declare Function MP4Read Lib "libMP4V2.dll" (ByVal filename As Byte(), ByVal verbosity As Integer) As IntPtr
Friend Declare Function MP4Modify Lib "libMP4V2.dll" (ByVal filename As Byte(), ByVal verbosity As Integer, ByVal flags As Integer) As IntPtr
Friend Declare Sub MP4TagsSetName Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetArtist Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetAlbum Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetReleaseDate Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetGenre Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetDescription Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetLongDescription Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVShow Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVNetwork Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVEpisodeID Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetTVSeason Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
Friend Declare Sub MP4TagsSetTVEpisode Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
Friend Declare Sub MP4TagsSetSortName Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetSortArtist Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetSortTVShow Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetCopyright Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetEncodingTool Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetKeywords Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetCategory Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetGrouping Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetComments Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal name As Byte())
Friend Declare Sub MP4TagsSetContentRating Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetPodcast Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetGapless Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetMediaType Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetCompilation Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetTrack Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As trackDisk)
Friend Declare Sub MP4TagsSetDisk Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As trackDisk)
Friend Declare Sub MP4TagsSetHDVideo Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Byte)
Friend Declare Sub MP4TagsSetContentID Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByRef id As Integer)
Friend Declare Sub MP4TagsRemoveArtwork Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal flags As Integer)
Friend Declare Sub MP4TagsStore Lib "libMP4V2.dll" (ByVal tags As IntPtr, ByVal file As IntPtr)
Friend Declare Sub MP4Close Lib "libMP4V2.dll" (ByVal file As IntPtr)
Original comment by danahins...@gmail.com
on 15 May 2011 at 4:23
Going to close this issue now--I think the remaining "need unicode support in
command line tools" aspects of it are probably better represented in issue 108.
Original comment by kid...@gmail.com
on 26 Jun 2011 at 6:58
Sounds good to me. I fixed my app and I'm not sure the command line tools are
all that important.
Original comment by danahins...@gmail.com
on 26 Jun 2011 at 3:51
@dana,
I am interested in your API definitions to use libmp4v2 through
vb.net...specifically how to save artwork and save information for the "rating"
tag (PG, PG-13 etc).
I have made a post on the discussion group regarding this, but it's moderated
so it might appear a little later.
Thanks
Original comment by kari...@gmail.com
on 18 Jan 2012 at 10:34
I've attached two files, one with the routines I use and one with the declares.
Let me know if you have any questions.
Dan
Original comment by danahins...@gmail.com
on 18 Jan 2012 at 10:49
Attachments:
@dana,
Thanks for the attachments, but I'm still a little confused.
I see you have declarations for "MP4V2VBWrapper.dll", but what/where is this
dll?
I guess the functions i'm looking for are:
Friend Declare Sub VBMP4SetRating Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
Friend Declare Sub VBMP4SetMOVI Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
Friend Declare Sub VBMP4SetKind Lib "MP4V2VBWrapper.dll" (ByVal buff As Byte(), ByVal file As IntPtr)
Friend Declare Sub VBMP4SetCoverArt Lib "MP4V2VBWrapper.dll" (ByVal buff() As Byte, ByVal len As Integer)
Right now in my code I only have code to write the coverArt (using .net 4.0 so
I have to use CallingConvention.Cdecl or it throws an error):
<DllImport("libMP4V2.dll", CharSet:=CharSet.Ansi, CallingConvention:=CallingConvention.Cdecl)>
Friend Shared Sub MP4TagsAddArtwork(ByVal tags As IntPtr, ByVal art As CoverArtBox)
End Sub
And then using the following to try to write the art, but it doesn't seem to
work, I see the thumbnail in windows explorer change but to a generic "media"
type icon in windows, so i'm guessing what i'm saving in the data is not
correct.
If Images.Count > 0 Then
Dim art As New libmp4v2.CoverArtBox
Using ms As IO.MemoryStream = New IO.MemoryStream
Images(0).Image.Save(ms, Images(0).Image.RawFormat)
art.size = ms.Length
art.data = ms.ToArray
ms.Close()
End Using
art.type = libmp4v2.MP4TagArtworkType.MP4_ART_UNDEFINED
libmp4v2.MP4TagsAddArtwork(tags, art)
End If
And how would I go about writing the rating?
FYI, I only want to write tags/atoms not read.
Thanks for your help.
Original comment by kari...@gmail.com
on 19 Jan 2012 at 4:25
How about you email me directly. I can either give you the DLL or give you the
DLL and the sources, whichever you prefer.
Dn
Original comment by danahins...@gmail.com
on 19 Jan 2012 at 5:21
Dan, Are you willing to share the DLL and VB files with me as well?
Original comment by rebelzo...@gmail.com
on 12 May 2013 at 4:49
Original issue reported on code.google.com by
danahins...@gmail.com
on 1 May 2011 at 5:32