ocaml / flexdll

a dlopen-like API for Windows
Other
97 stars 30 forks source link

Put error global variables into thread-local storage #112

Closed shym closed 1 year ago

shym commented 1 year ago

Global variable error_buffer is used to store a string that is returned to callers, so there is a race condition if dynamic linking is invoked from 2 OCaml domains in parallel Since the error message must be returned, a mutex cannot be used to prevent the race condition GNU libc uses that same solution: keep the last error in thread-local storage; so support for calling dlerror from a different thread than the one calling dlopen is not to be expected

The problem with parallel accesses was discovered investigating segfaults in ocaml-multicore/multicoretests#290. With this patch, these tests do not segfault. I see the following 4 dynlink tests from the ocaml test suite failing, but they seem to fail whether or not this patch is applied and the error message seems not related.

> run_win32.c:365: CreateProcess failed: The system cannot find the file specified.
[…]
List of failed tests:
    tests/lib-dynlink-csharp /'main.ml' with 1.1.4.1.1.1.1 (script)
    tests/lib-dynlink-csharp /'main.ml' with 1.1.3.1.1.1 (script)
    tests/lib-dynlink-csharp /'main.ml' with 1.1.2.1.1.1.1 (script)
    tests/lib-dynlink-csharp /'main.ml' with 1.1.1.1.1.1 (script)
dra27 commented 1 year ago

(closed and re-opened to recompute the merge commit now that #115 is merged, so CI should have something useful to say!)

dra27 commented 1 year ago

Hmm, this is looking tedious... it looks like we're pulling in a runtime library somewhere from the error. I'm guessing that /usr/i686-w64-mingw32/sys-root/mingw/bin (and the x64 equivalent) need adding to PATH for this to work, although we should nail down exactly which DLL it's trying to pull in. That's a bit too heavy for this - we might instead need to hand-roll the native Windows version using TlsAlloc et al.

shym commented 1 year ago

I updated the PR with an implementation using Tls* functions. This turned out a bit more involved than what I first thought, since TlsGetValue modifies the result of GetLastError which we want to preserve. I hope the comments are enough to clarify the implementation choices.

Reproducing locally the test that failed, I get a pop-up saying libwinpthreads-1.dll is missing, maybe too big a dependency for that single feature.

nojb commented 1 year ago

Hello @shym: just a heads-up that I am planning to read this PR soon. Thank you for your patience!

shym commented 1 year ago

Thanks you for your kind and thorough review! I've updated the branch removing the goto. After looking at another PR, I added an entry to CHANGES.

shym commented 1 year ago

Very good idea indeed! I noticed that the documentation for SetLastError explicit states that the last error is stored in TLS and that values with bit 29 set are reserved for user errors. But I didn’t find a trick to reuse that to fully skip using TLS explicitly ourselves, especially since POSIX’s dlerror must report the last error of a dl function, so other functions must not interfere with the result it will report. So I just updated the patch removing the last_error field and merging all the non-resetting behaviours. It’s a lot simpler to read.

shym commented 1 year ago

Just changed the error message, and added due credit! :smile:

nojb commented 1 year ago

Thanks, merged! (I took the liberty of squashing the commits into a single commit; this makes it easier to revert, cherry-pick, etc.)