ocaml / flexdll

a dlopen-like API for Windows
Other
97 stars 30 forks source link

Patch RELOC_REL32 cases where offset is > 2GiB #52

Closed dra27 closed 3 years ago

dra27 commented 6 years ago

Previously, if a DLL was loaded more than 2GiB away from the executable image in the address space, flexdll_relocate would fail if there were any RELOC_REL32 relocations as the offset couldn't be represented in a 32-bit number.

The solution at the time was to specify a base address of 0x10000 for the MSVC64 and Cygwin64 ports (for some reason, mingw64 got forgotten about). For MSVC64, this generally seemed to work. For Cygwin64, this broke Unix.fork. Recent changes to Cygwin64 also mean that Cygwin DLLs and executables are not supposed to occupy memory below 0x2:00000000 and 0x1:00000000 respectively.

This patch creates a spare executable section in the DLL which is used to create trampolines for relocations where the offset is over 2GiB. As a result, it's no longer necessary to specify the image base address, so this default has been removed and both the Cygwin64 fork issue and RELOC_REL32 are simultaneously solved.

This needs quite a lot of testing which I'm doing, but it would also be handy for the code to start being reviewed. The easiest option would have been to break backwards compatibility (format of symtbl would have changed and the number of parameters for flexdll_relocate). The version I give here allows mixing of DLLs and executables compiled with both this new version and older versions.

alainfrisch commented 6 years ago

Thanks for working on this!

(FTR, #2 has a similar proposal, but the discussion there died quickly. Contrary to the current PR, #2 generates the trampoline at link time (independently of the target distance) rather than at load time (the flexdll runtime does not need to be changed).)

Some comments/questions:

  1. The trampoline code destroys %rax. As far as I understand, this is in line with the Microsoft x64 ABI (volatile, and not used to pass arguments) for C code. But for instance, OCaml would use this register to pass an argument. I think it would be better to make flexdll less dependent on the ABI, even at the cost of an extra indirection. Looking at what gcc -O2generates for:
void (*f)(void);
void foo() { f(); }

=>

foo:
        .seh_endprologue
        rex.W jmp       *f(%rip)
        .seh_endproc

producing:

0000000000000000 <foo>:
   0:   48 ff 25 00 00 00 00    rex.W jmpq *0x0(%rip)        # 7 <foo+0x7>

(not sure if the prefix is required), which is what #2 was proposing to use.

  1. The criterion to determine if a trampoline can be used is to look if the target falls in an executable page. Of course, there can be non-code targets in such a section (I think even OCaml used to put jump tables in the code section). Should we harden the criterion by checking that the instruction just before the relocation is a jump or call? This would still be a heuristic, of course. Btw, have you ever seen a case where either cl or gcc produces relative relocations for non-code targets?

  2. Can you confirm the problem is only with C code? The OCaml backend should go through __caml_imp_XXX indirections for code jmp/call targets, and use movabsq (with a 64-bit literal) to load the address of symbols (=> absolute relocations).

dra27 commented 6 years ago

Hah - I hadn't realised that there was another PR which had worked on this! I'm still working through understanding this all properly - in particular, there's something weird happening with on mingw64 when the DLL is located far away where caml_reify_bytecode is raising an End_of_file exception!

I'm certainly only seeing the trampolines being generated for C primitives in the runtime, yes. AFAICT so far, the default -mcmodel=32 for gcc doesn't generate REL32 instructions for data symbols. The weak heuristic I'd added was when trying to break it by saying -mcmodel=small (which does break it). My reading so far suggests that mcmodel=medium would never generate REL32 instructions for data symbols, because that model doesn't permit the assumption that data is close enough - but I haven't yet found any documentation as to whether the default mcmodel=32 assumption uses this or not. And of course I don't know what MSVC's doing with all this (although it does seem to be working).

FWIW, I think patching the runtime is better - definitely on mingw64 and I think now on msvc64, the default load addresses for executables should be < 0xffffffff (I think that the issue seen with this in 2008 no longer exists). It seems unnecessary to be generating these jump tables when most of the time they're not needed. That said, on Cygwin64 they will always be needed!

This present implementation has a bug at the moment which becomes revealed if you load two DLLs (like ocamldoc, loading str and unix) - the cached trampolines for dllunix can be made to be too far away for dllstr, so I need to use a per-dll symbol cache which should actually make that code a bit neater anyway. It's also unnecessarily doing all this for 32-bit.

I won't have much time on this today, but I'll be back on it with a vengeance tomorrow!

alainfrisch commented 6 years ago

Note: the indirect jump instruction is also what is used for ___imp_XXX symbols, such as:

__declspec(dllimport)  void f(void);
void foo() { f(); }

==>

        rex.W jmp       *__imp_f(%rip)

Btw, adding back __declspec(dllimport) for all C functions in the OCaml runtime (that need to be accessed from .dll) would also solve the problem (in a less general way), right?

dra27 commented 6 years ago

That change still doesn't deal with the caml_reify_bytecode issue (I'm wondering if there's some dodgy pointer arithmetic somewhere...)

dra27 commented 6 years ago

OK, I think this is now functionally complete. https://github.com/alainfrisch/flexdll/pull/52/commits/d0e07c2d7273b2a231625455d01b33c19fcfd9f5 includes building ocamldoc for the two OCaml tests which is a good idea in general, and should go in a separate PR. ocamldoc is a useful bytecode test for flexdll, because it uses both unix and str.

https://github.com/alainfrisch/flexdll/pull/52/commits/a84f9fbbdd1e4f3bf4ec30b5ad3d4f1592a4ba7e artificially places dllunix and dllcamlstr at base addresses which are both more than 2GiB from ocamlrun and also more than 2GiB from each other. I don't think we should commit with this: there are two better tests:

  1. Include a test in the flexdll testsuite which tests plugins on x64 which at distant base addresses and at close base addresses (for mingw64 and msvc64 - close base addresses are irrelevant on cygwin64, because they can't happen)
  2. Add an AppVeyor test to build OCaml on Cygwin64 ... but this would be easier if the Cygwin build also permitted bootstrapping flexdll (which I'm now working on)
dra27 commented 3 years ago

I'm going to close, on the basis that I no longer think we should fix it. There are three reasons for this: