ocaml / flexdll

a dlopen-like API for Windows
Other
98 stars 30 forks source link

Add Unicode support to flexdll (was: Add wide-character version of flexdll_dlopen) #34

Closed nojb closed 7 years ago

nojb commented 7 years ago

See #33, ocaml/ocaml#1200.

nojb commented 7 years ago

OK, I pushed a different approach.

To avoid duplicating any code I extracted the two functions that need to be adapted (flexdll_dlopen and ll_dlopen) into the file flexdll_dlopen-templ.c with some %%VARIABLE%% placeholders. The Makefile generates flexdll_dlopen.c and flexdll_dlwopen.c by making the necessary substitutions (using sed) and those two files are #included from flexdll.c.

What do you think ?

alainfrisch commented 7 years ago

I'm not a big fan of putting such logic in the build system and introducing a custom syntax. It seems relying on the C preprocessor would work as well (simply include the same file twice, defining macros with different content each time).

Alternatively, you could use a single function taking a flag to choose between A and W variants; the string argument can be a (void), cast to either (char) or (WCHAR*) according to the flag.

nojb commented 7 years ago

OK, I pushed another approach. We make the wide version the "main" one and the existing version becomes a small wrapper around the wide one converting between the two encodings.

I will now be testing this to make sure it works :)

nojb commented 7 years ago

Tested by bootstrapping flexdll and ocaml from the official distribution and using the resulting native-code compiler to compile a small example, all of which seemed to work well.

nojb commented 7 years ago

I added one more commit with a primitive solution for #36, to be discussed.

alainfrisch commented 7 years ago

I think the Cygwin case is broken, as reported by the warning:

i686-pc-cygwin-gcc -c -DCYGWIN -o flexdll_cygwin.o flexdll.c
flexdll.c: In function ‘flexdll_wdlopen’:
flexdll.c:380:22: warning: passing argument 1 of ‘ll_dlopen’ from incompatible pointer type [-Wincompatible-pointer-types]
   handle = ll_dlopen(file, exec);
                      ^
flexdll.c:52:14: note: expected ‘const char *’ but argument is of type ‘const WCHAR * {aka const short unsigned int *}’
 static void *ll_dlopen(const char *libname, int for_execution) {
              ^
alainfrisch commented 7 years ago

Also the tests are failing:

$ make demo_msvc
(cd test && LIB="C:/cygwin/home/frisch/SHARED~2/msvc/v9/Lib;C:/cygwin/home/frisch/SHARED~2/winsdk/v7.0/Lib" INCLUDE="C:/cygwin/home/frisch/SHARED~2/msvc/v9/Include;C:/cygwin/home/frisch/SHARED~2/winsdk/v7.0/Include" make clean demo CHAIN=msvc CC="C:/cygwin/home/frisch/SHARED~2/msvc/v9/Bin/cl.exe /nologo /MD -D_CRT_SECURE_NO_DEPRECATE /GS-" O=obj)
make[1]: Entering directory '/home/frisch/flexdll/test'
rm -f *.o *.obj *.dll *.exe *~ *.manifest
C:/cygwin/home/frisch/SHARED~2/msvc/v9/Bin/cl.exe /nologo /MD -D_CRT_SECURE_NO_DEPRECATE /GS- -I.. -c dump.c
dump.c
..\flexdll.h(25) : error C2143: syntax error : missing ')' before '*'
..\flexdll.h(25) : error C2143: syntax error : missing '{' before '*'
..\flexdll.h(25) : error C2059: syntax error : ','
..\flexdll.h(25) : error C2059: syntax error : ')'
$ make demo_mingw
(cd test && make clean demo CHAIN=mingw CC="i686-w64-mingw32-gcc" O=o)
make[1]: Entering directory '/home/frisch/flexdll/test'
rm -f *.o *.obj *.dll *.exe *~ *.manifest
i686-w64-mingw32-gcc -I.. -c dump.c
In file included from dump.c:14:0:
../flexdll.h:25:29: error: unknown type name ‘WCHAR’
 void *flexdll_wdlopen(const WCHAR *, int);
                             ^
alainfrisch commented 7 years ago

Tests can be fixed by including windows.h in flexdll.h instead of flexdll.c.

alainfrisch commented 7 years ago

The Cygwin problem is more tricky. We want to use Cygwin's dlopen to allow using POSIX paths. But I could not find a "w" variant of it. So it is not clear what to do for flexdll_wdlopen. One could try to recode the WCHAR to the current code page, but this can obviously fail.

nojb commented 7 years ago

Ah yes, sorry about this. I will fix the tests.

Re the Cygwin issue, a simple possibility is to define a macro (say, TCHAR) which stands for char in Cygwin and WCHAR on WIN32, we use that for the signature of flexdll_wdlopen, and and we make flexdll_wdlopen equal to flexdll_dlopen on Cygwin.

alainfrisch commented 7 years ago

For Cygwin, your suggestion means that the caller needs to behave differently if compiled through cygwin or a native compiler. And this means only filenames that can be expressed in the current code page can be used. Why not, but at this point, it think it's better to tell users to use flexdll_dlopen only under Cywin (either hide flexdll_wdlopen or have it fail at runtime). Or we could try to recode to local code page in ll_dlopen for Cygwin (so that flexdll_wdlopen can be used as long as the filename can be encoded). Yet another option is to add a fallback to LoadLibraryW when the filename cannot be encoded (so the caller can use either a POSIX path that can be encoded in the local code page or an arbitrary Windows path).

nojb commented 7 years ago

Yes, you are right - my suggestion was a bit muddled; what I meant is this: in flexdll.h we write the following

void *flexdll_dlopen(const char *, int);
#ifdef _WIN32
void *flexdll_wdlopen(const WCHAR *, int);
#endif

#if defined(__CYGWIN__) || !defined(_UNICODE)
#define flexdll_tdlopen flexdll_dlopen
#else
#define flexdll_tdlopen flexdll_wdlopen
#endif

Then

1) One can use flexdll_dlopen both in Cygwin and native and its behaviour is the same as today. 2) One can use flexdll_wdlopen in native if one wants that. 3) One can use flexdll_tdlopen in both Cygwin and native to automatically choose the wide or narrow version (compatible with the way Windows headers work). Under Cygwin this is always flexdll_dlopen.

Personally I think this is simpler than trying to recode under Cygwin and cleaner than using a fallback.

nojb commented 7 years ago

BTW, is https://github.com/alainfrisch/flexdll/blob/master/flexdll.c#L362 correct? Shouldn't it be __CYGWIN__?

alainfrisch commented 7 years ago

We pass -DCYGWIN (resp. -DMSVC/-DMINGW) in the Makefile when we compile flexdll.c.

alainfrisch commented 7 years ago

I think it's useful to have access to flexdll_wdlopen even under Cygwin. But perhaps it's ok to declare that it doesn't support POSIX paths (so that it can always use LoadLibraryW). Only flexdll_dlopen would support POSIX path (through Cygwin's dlopen).

nojb commented 7 years ago

What would be the usecase of having flexdll_wdlopen under Cygwin ? Is it in case one wants/needs to use both Cygwin's POSIX API and the native Win API ?

alainfrisch commented 7 years ago

The case I had in mind was simply opening a DLL specified by a POSIX path with "non-local" characters (i.e. "/foo/Ђ").

nojb commented 7 years ago

I think it's useful to have access to flexdll_wdlopen even under Cygwin. But perhaps it's ok to declare that it doesn't support POSIX paths (so that it can always use LoadLibraryW). Only flexdll_dlopen would support POSIX path (through Cygwin's dlopen).

Maybe I am missing something, but what is "POSIX path" in this context ? I assumed that Cygwin's dllopen (and other filename-related functions) always assumed the argument to be encoded in the local codepage. They must make an encoding assumption somewhere along the way in order to implement those functions using native Windows APIs, right ?

alainfrisch commented 7 years ago

POSIX path: a path interpreted by Cygwin (supporting "/foo/bar", following Cygwin symlinks, etc).

I share your interpretation that Cygwin has to make an assumption about the encoding of file name when mapping to the Windows API. I guess they rely on the local codepage (probabperhaps by calling A functions, not W ones).

So:

nojb commented 7 years ago

So, it turns out that Cygwin functions assume their arguments are encoded in the "current locale" and translated internally into UTF-16, so that there is no need for flexdll_wdlopen. Here, "current locale" is the one set at process startup (changes during the lifetime of the process do not change the behaviour of the Cygwin API).

See https://cygwin.com/cygwin-ug-net/setup-locale.html for more.

nojb commented 7 years ago

I amended this PR as discussed (no wide version under Cygwin), and cleaned it up. AFAIK, the only remaining issue is that the UTF-16 response file support was not backwards compatible, as it assumes that OCaml strings are UTF-8 encoded. To handle this, I implemented the usual fallback: strings are assumed to be UTF-8 but if this is not the case then we assume they are in the local code page and fall back to the usual non UTF-16-encoded response files.

Let me know if I am forgetting about anything else, otherwise I think this should be good to go!