ocaml / flexdll

a dlopen-like API for Windows
Other
98 stars 30 forks source link

flexlink produces an invalid dll when building lablgtk-2.18.3 on mingw64 #6

Open eternalNight opened 9 years ago

eternalNight commented 9 years ago

Hi all,

I'm recently building lablgtk (a GTK2 wrapper for OCaml) using mingw64 toolchains provided by msys2. The package uses ocamlmklib (and thus flexlink) to create a dll library called dlllablgtk2.dll. Here are the version of the tools in my environment:

flexdll    0.34 (from http://alain.frisch.fr/flexdll.html; built from source)
ocaml    4.02.1 (built from source)

Flexlink generates the library without error, but the library is considered invalid by LoadLibraryEx:

Error: Error on dynamically loaded library: .\dlllablgtk2.dll: %1 is not a valid win32 application

The following toy program gives the same result.

$ cat testdll.c
#include <flexdll.h>
#include <stdio.h>
#include <windows.h>

int main(int argc, char *argv[]) {
    void *handle;
    printf("Try open: %s\n", argv[1]);
    handle = flexdll_dlopen(argv[1], FLEXDLL_RTLD_GLOBAL);
    printf("Handle: %p\n", handle);
    if (handle == NULL) {
            printf("Error code: %d\n", GetLastError());
            printf("Error message: %s\n", flexdll_dlerror());
    }
    return 0;
}

$ flexlink -chain mingw64 -exe -o testdll testdll.c
$ testdll.exe dlllablgtk2.dll
Try open: dlllablgtk2.dll
Handle: 0000000000000000
Error code: 193
Error message: %1 is not a valid win32 application

The library is created using 24 object files in addition to some system libraries. The command is:

flexlink -v -v -chain mingw64 -LD:/msys64/mingw64/x86_64-w64-mingw32/lib \
-o dlllablgtk2.dll -lpthread -LD:/msys64/mingw64/lib -lgtk-win32-2.0 \
-limm32 -lshell32 -lole32 -lpangocairo-1.0 -lpangoft2-1.0 -lpangowin32-1.0 -lgdi32 \
-lpango-1.0 -lm -latk-1.0 -lcairo -lpixman-1 -lfontconfig -lexpat -lfreetype -lexpat -lfreetype \
-lbz2 -lharfbuzz -lgdk_pixbuf-2.0 -lpng16 -lgio-2.0 -lz -lgmodule-2.0 -lgobject-2.0 -lffi \
-lglib-2.0 -lws2_32 -lole32 -lwinmm -lshlwapi -lintl \
ml_gobject.o ml_gpointer.o ml_gtk.o ml_gtkaction.o ml_gtkbin.o ml_gtkbroken.o ml_gtkbutton.o \
ml_gtkassistant.o ml_gtkedit.o ml_gtkfile.o ml_gtklist.o ml_gtkmenu.o ml_gtkmisc.o ml_gtkpack.o \
ml_gtkrange.o ml_gtkstock.o ml_gtktext.o ml_gtktree.o ml_gdkpixbuf.o ml_gdk.o ml_glib.o \
ml_pango.o ml_gvaluecaml.o wrappers.o

When I remove some of the objects (e.g. ml_gtktree.o), the generated library becomes valid.

$ testdll.exe dlllablgtk2.dll        # ml_gtktree.o removed from the command
Try open: dlllablgtk2.dll
Handle: 0000000000000000
Error code: 1114
Error message: Cannot resolve caml_failwith

It seems the issue is not raised by a single object. The library built without ml_gtktext.o (but with ml_gtktree.o) is also valid.

The binaries from https://github.com/shadinger/flexdll-win64 (version 0.26) does not suffer from this issue.

Here is the verbose log during linking.

** Use cygpath: true
** Search path:
D:/msys64/mingw64/lib
D:/msys64/mingw64/x86_64-w64-mingw32/lib
D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.2
/mingw/lib
/mingw64/x86_64-w64-mingw32/lib
** Default libraries:
dllcrt2.o
-lmingw32
-lgcc
-lmoldname
-lmingwex
-lmsvcrt
-luser32
-lkernel32
-ladvapi32
-lshell32
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\dllcrt2.o
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmingw32.a
** open: D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.2\libgcc.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmoldname.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmingwex.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmsvcrt.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libuser32.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libkernel32.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libadvapi32.a
+ x86_64-w64-mingw32-gcc -mconsole -shared -Wl,-eFlexDLLiniter  -L. -I"D:/msys64/mingw64/lib" -I"D:/msys64/mingw64/x86_64-w64-mingw32/lib" -L"D:/msys64/mingw64/lib" -L"D:/msys64/mingw64/x86_64-w64-mingw32/lib" -o "test.dll" "D:\msys64\tmp\dyndll3ef3ef.o" "D:\msys64\mingw64\bin\flexdll_initer_mingw64.o" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libpthread.dll.a" "D:/msys64/mingw64/lib\libgtk-win32-2.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libimm32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libshell32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libole32.a" "D:/msys64/mingw64/lib\libpangocairo-1.0.dll.a" "D:/msys64/mingw64/lib\libpangoft2-1.0.dll.a" "D:/msys64/mingw64/lib\libpangowin32-1.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libgdi32.a" "D:/msys64/mingw64/lib\libpango-1.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libm.a" "D:/msys64/mingw64/lib\libatk-1.0.dll.a" "D:/msys64/mingw64/lib\libcairo.dll.a" "D:/msys64/mingw64/lib\libpixman-1.dll.a" "D:/msys64/mingw64/lib\libfontconfig.dll.a" "D:/msys64/mingw64/lib\libexpat.dll.a" "D:/msys64/mingw64/lib\libfreetype.dll.a" "D:/msys64/mingw64/lib\libbz2.dll.a" "D:/msys64/mingw64/lib\libharfbuzz.dll.a" "D:/msys64/mingw64/lib\libgdk_pixbuf-2.0.dll.a" "D:/msys64/mingw64/lib\libpng16.dll.a" "D:/msys64/mingw64/lib\libgio-2.0.dll.a" "D:/msys64/mingw64/lib\libz.dll.a" "D:/msys64/mingw64/lib\libgmodule-2.0.dll.a" "D:/msys64/mingw64/lib\libgobject-2.0.dll.a" "D:/msys64/mingw64/lib\libffi.dll.a" "D:/msys64/mingw64/lib\libglib-2.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libws2_32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libwinmm.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libshlwapi.a" "D:/msys64/mingw64/lib\libintl.dll.a" "D:\msys64\tmp\dyndll00be4c.o" "D:\msys64\tmp\dyndlle902c0.o" "D:\msys64\tmp\dyndll54d32d.o" "D:\msys64\tmp\dyndll2e0163.o" "ml_gtkbin.o" "D:\msys64\tmp\dyndll7ac0f6.o" "D:\msys64\tmp\dyndll3f46a1.o" "D:\msys64\tmp\dyndll6e7d00.o" "D:\msys64\tmp\dyndll709dae.o" "D:\msys64\tmp\dyndll4b5dee.o" "D:\msys64\tmp\dyndll027612.o" "D:\msys64\tmp\dyndll478b19.o" "D:\msys64\tmp\dyndll0fdffc.o" "D:\msys64\tmp\dyndll533488.o" "D:\msys64\tmp\dyndllc5412c.o" "D:\msys64\tmp\dyndllb81a8b.o" "D:\msys64\tmp\dyndll5f1731.o" "D:\msys64\tmp\dyndll4bc469.o" "D:\msys64\tmp\dyndlleed2db.o" "D:\msys64\tmp\dyndlla2929b.o" "D:\msys64\tmp\dyndll56c73c.o" "D:\msys64\tmp\dyndllde988f.o" "ml_gvaluecaml.o" "D:\msys64\tmp\dyndll049034.o" "D:\msys64\tmp\flexlink250fe6.def"
(call with bash: D:\msys64\tmp\longcmd233aa5)
alainfrisch commented 9 years ago

Is it easy for you to test with OCaml trunk? The win64 backend has been changed to avoid problems when the DLL is loaded too far away in memory from the main process, and this might fix such issues.

eternalNight commented 9 years ago

I have tried the latest ocaml and camlp4 from the github mirror. The problem remains.

eternalNight commented 9 years ago

The issue seems to be related to the cygwin64 COMDATA hacks which are introduced in commit 37e6b5ad904b0d4648cebb09c19ed10e6f8dea28. The library works if the snippets are commented out.

alainfrisch commented 9 years ago

Perhaps the current hacks for Cygwin64 should be restricted to cygwin64 indeed. Can you check which parts in the commit you refer to must be disabled (there are two fragments related to COMDATA sections -- do we need to disable both)?

eternalNight commented 9 years ago

Disabling the following fragment in add_reloc_table works in my case:

    if sec.sec_opts &&& 0x1000l <> 0l && has_prefix ".rdata$.refptr." sec.sec_name then
      begin
        (* under Cygwin64, gcc introduces mergable (link once) COMDAT sections to store
           indirection pointers to external darta symbols.  Since we don't deal with such section
           properly, we turn them into regular data section, thus loosing sharing (but we don't care). *)
        sec.sec_opts <- 0xc0500040l;
        sec.sec_name <- Printf.sprintf ".flexrefptrsection%i" (Oo.id (object end));
      end;

This should be the first fragment mentioning COMDATA in the patch.

alainfrisch commented 8 years ago

As reported by Andreas Hauptmann on the caml-list:

It either won't solve the issue or it will introduce new ones (I don't remember details, but I've tried it). As a temporary workaround, you can try to strip your invalid dll files (e.g. 'x86_64-w64-mingw32-strip --strip-unneeded dlllablgtk2.dll') or switch to an older version of the gcc-toolchain (4.8 or 4.7).

yselkowitz commented 7 years ago

I'm having what appears to be a related issue trying to build lablgtk 2.18.5 with flexdll 0.35 and ocaml 4.02.3 on cygwin64:

ocamlmktop -I +lablGL -thread -o lablgtktop unix.cma threads.cma lablgl.cma \ -I . lablgtk.cma lablgtkgl.cma lablglade.cma lablgnomecanvas.cma lablgnomeui.cma lablrsvg.cma lablgtkspell.cma lablgtksourceview2.cma gtkThread.cmo File "none", line 1: Error: Error on dynamically loaded library: ./dlllablgtk2.so: Exec format error

yselkowitz commented 7 years ago

FWIW, stripping does help on Cygwin; I was able to get a successful and functional build by adding -ldopt -Wl,-s to the ocamlmklib -o lablgtk command.

MSoegtropIMC commented 7 years ago

Would it be possible to eventually fix this? This issue is hanging around for more than 2 years now. I just tried it with the source and binary delivieres version 0.35 as well as the current git master. This is a major source of build unreliabilities in the Windows builds of INRIA Coq. I currently use an explicit call to strip which magically fails as well if completely unrelated things in the build script are changed (like to which file messages are redirected). Why this is even procmon couldn't help me to understand. I will now instead try the method suggested above instead of the explicit call to strip.

But I would really appreciate a fix for this problem. If there is anything I can do to help, please let me know. E.g. I can send a script which sets up a fresh cygwin and reproduces the error with a single call to a batch file.

Best regards,

Michael

alainfrisch commented 7 years ago

I'm afraid I don't understand the problem enough to fix it, and don't have the time and courage to investigate. If you could create a simple reproduction case that don't involve a bunch of external libraries, this would definitely make the problem easier to investigate. But the conclusion could also be that there is no easy fix.

I think my recommendation would be to avoid using flexlink with code not generated by OCaml compilers. For your use case, is it an option to link all native libraries statically in the main program?

MSoegtropIMC commented 7 years ago

Dear Alain,

you are right, maybe the best option is to patch the lablgtk build scripts such that they create just a static library and use this. I think for the whole lablgtk library there is no need to link it dynamically, since the GUI tool always needs it and in Coq there is only one GUI tool, so there wouldn't be DLL sharing either.

Also it is an interesting hint that the issues might come from the C code in lablgtk.

I will let you know how it goes along this path.

Best regards,

Michael

yselkowitz commented 7 years ago

FYI this is the patch I used to work around this:

https://github.com/cygwinports/ocaml-lablgtk2/blob/master/2.18.5-flexlink.patch

dra27 commented 6 years ago

I ran into this too, but doing my usual fumbling in the dark noticed that one of the fixes above involved reducing the number of .o files which in turn reduces the number of sections. That got me thinking that the name change removes the rdata$ prefix which is an instruction to the linker to merge the sections. So I tried this:

diff --git a/reloc.ml b/reloc.ml
index 358f6b9..823021d 100644
--- a/reloc.ml
+++ b/reloc.ml
@@ -434,7 +434,7 @@ let add_reloc_table obj obj_name p =
            indirection pointers to external darta symbols.  Since we don't deal with such section
            properly, we turn them into regular data section, thus loosing sharing (but we don't care). *)
         sec.sec_opts <- 0xc0500040l;
-        sec.sec_name <- Printf.sprintf ".flexrefptrsection%i" (Oo.id (object end));
+        sec.sec_name <- Printf.sprintf ".flex$.flexrefptrsection%i" (Oo.id (object end));
       end;

     let min = ref Int32.max_int and max = ref Int32.min_int in

which appears to be enough to build a working lablgtk2 without having to strip the DLLs (one slight thing which concerned me with the stripping is that the resulting DLLs also crash Microsoft's objdump, though that may be objdump's fault, and it was on the Windows 7 SDK).

I don't really understand @alainfrisch's comment about not dealing with the section properly - what prompted you to write that comment originally? Is this all related to https://github.com/alainfrisch/flexdll/pull/52 and should we therefore be deleting these sections for symbols which flexdll is going to relocate and simply leaving it alone for any other symbols, which presumably the linker is going to deal with. It appears on a vague inspection that the linker will eliminate these in "normal" linking, so I'm guessing they just get folded into the normal relocation process?

Again, fumbling around trying to diagnose the original problem, I can't find a reference to the idea of a problem about having too many sections in the PE header (there's a reference to a limit of 96 for Windows XP but it's increased to 65536 since Vista, so that doesn't seem a likely candidate). Perhaps it's that these sections appear before one of the others and some offset becomes too big or larger than expected. Either way, stripping removes those sections and, on the basis that merging them also seems to fix the problem then it would appear to be the number of them which is the underlying issue.

But it still begs the question (from me at least) of what precisely they're for and what we should really be being done with them...

alainfrisch commented 6 years ago

I don't really understand @alainfrisch's comment about not dealing with the section properly - what prompted you to write that comment originally?

I don't remember exactly, but keeping the COMDAT section resulted in some problems with cygwin64. (Perhaps because the section could be merged by the linker, and this breaks some assumptions made by the flexdll runtime.) I don't think this was related to #52.