mingw-w64 / mingw-w64

(Unofficial) Mirror of mingw-w64-code
https://mingw-w64.org/
Other
340 stars 50 forks source link

winpthreads leaks TLS index on library unload #52

Open Saancreed opened 1 month ago

Saancreed commented 1 month ago

Not sure is this is the proper place to report winpthreads issues so please let me know if there is any better issue tracker for this stuff.

When a library linked with static winpthreads is loaded using LoadLibrary, winpthreads will usually allocate a TLS index for the library to use. That index however appears to be never freed, so applications that repeatedly load and unload libraries statically linked to winpthreads will eventually run out of TLS indexes and abort after a few hundred iterations.

The issue can be reproduced trivially with this sample library:

#include <iostream>

void __declspec(dllexport) hello()
{
    std::cout << "Hello, world!" << std::endl;
}

(Build with x86_64-w64-mingw32-g++ -shared hello.cpp -o hello.dll -static)

And this test program:

#include <stdio.h>
#include <processthreadsapi.h>
#include <libloaderapi.h>

int main(void)
{
    for (int i = 0; i < 5000; ++i)
    {
        fprintf(stderr, "Starting %d\n", i);

        if (!(i % 100))
        {
            DWORD tls = TlsAlloc();

            if (tls != TLS_OUT_OF_INDEXES)
            {
                fprintf(stderr, "Next TLS index at %llu\n", (unsigned long long)tls);
                TlsFree(tls);
            }
        }

        HMODULE lib = LoadLibraryA("hello.dll");

        if (!lib)
        {
            fprintf(stderr, "LoadLibrary failed\n");
            return 1;
        }

        fprintf(stderr, "Loaded\n");

        if (!FreeLibrary(lib))
        {
            fprintf(stderr, "FreeLibrary failed\n");
            return 2;
        }

        fprintf(stderr, "Freed\n");
    }
}

(Build with x86_64-w64-mingw32-gcc main.c -o main.exe)

Test program aborts at i = 1086 on Windows 11 and i = 1085 in Wine 9.x, with the stack trace (from Wine) looking like so:

(gdb) bt
#0  raise (sig=22) at ../wine-git/dlls/msvcrt/except.c:711
#1  0x00006ffffefa7f87 in abort () at ../wine-git/dlls/msvcrt/exit.c:268
#2  0x00006fffe3d99f0b in pthread_tls_init ()
    at /build/mingw-w64-winpthreads/src/mingw-w64-v11.0.0/mingw-w64-libraries/winpthreads/src/thread.c:940
#3  _pthread_once_raw (func=0x6fffe3b4e770 <pthread_tls_init>, o=0x6fffe3eaf5b0 <__IAT_end__+2412>)
    at /build/mingw-w64-winpthreads/src/mingw-w64-v11.0.0/mingw-w64-libraries/winpthreads/src/thread.c:633
#4  0x00006fffe3b4f636 in __pthread_self_lite ()
    at /build/mingw-w64-winpthreads/src/mingw-w64-v11.0.0/mingw-w64-libraries/winpthreads/src/thread.c:998
#5  pthread_getclean ()
    at /build/mingw-w64-winpthreads/src/mingw-w64-v11.0.0/mingw-w64-libraries/winpthreads/src/thread.c:1059
#6  pthread_once (o=0x6fffe3da04c0 <std::locale::_S_once>, func=0x6fffe3d04d20 <std::locale::_S_initialize_once()>)
    at /build/mingw-w64-winpthreads/src/mingw-w64-v11.0.0/mingw-w64-libraries/winpthreads/src/thread.c:756
#7  0x00006fffe3d07375 in __gthread_once (__func=0x6fffe3d04d20 <std::locale::_S_initialize_once()>,
    __once=<optimized out>)
    at /build/mingw-w64-gcc/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/include/x86_64-w64-mingw32/bits/gthr-default.h:700
#8  std::locale::_S_initialize () at /build/mingw-w64-gcc/src/gcc/libstdc++-v3/src/c++98/locale_init.cc:343
#9  std::locale::locale (this=this@entry=0x6fffe3d9fb18 <__gnu_internal::buf_cout_sync+56>)
    at /build/mingw-w64-gcc/src/gcc/libstdc++-v3/src/c++98/locale_init.cc:271
#10 0x00006fffe3d6a2db in std::basic_streambuf<char, std::char_traits<char> >::basic_streambuf (this=<optimized out>)
    at /build/mingw-w64-gcc/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/include/streambuf:473
#11 __gnu_cxx::stdio_sync_filebuf<char, std::char_traits<char> >::stdio_sync_filebuf (__f=0x6fffff01f1f0,
    this=<optimized out>)
    at /build/mingw-w64-gcc/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libstdc++-v3/include/ext/stdio_sync_filebuf.h:81
#12 std::ios_base::Init::Init (this=<optimized out>) at /build/mingw-w64-gcc/src/gcc/libstdc++-v3/src/c++98/ios_init.cc:85
#13 0x00006fffe3d6a21d in std::ios_base::Init::Init (this=<optimized out>)
    at /build/mingw-w64-gcc/src/gcc/libstdc++-v3/src/c++98/ios_init.cc:121
#14 0x00006fffe3d973e0 in __static_initialization_and_destruction_0 ()
    at /build/mingw-w64-gcc/src/gcc/libstdc++-v3/src/c++98/globals_io.cc:109
#15 _GLOBAL__sub_I.00090__ZSt3cin(void) () at /build/mingw-w64-gcc/src/gcc/libstdc++-v3/src/c++98/globals_io.cc:109
#16 0x00006fffe3b414d2 in __do_global_ctors () at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/gccmain.c:44
#17 0x00006fffe3b4152f in __main () at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/gccmain.c:58
#18 0x00006fffe3a8127f in __DllMainCRTStartup (hDllHandle=0x6fffe3a80000 <atexit_table>, dwReason=1, lpreserved=0x0)
    at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/crtdll.c:185
#19 0x00006fffffc6ddf4 in ?? ()
#20 0x00006fffffc7308e in MODULE_InitDLL (wm=wm@entry=0x241170, reason=<optimized out>, reason@entry=1,
    lpReserved=<optimized out>, lpReserved@entry=0x0) at ../wine-git/dlls/ntdll/loader.c:1708
#21 0x00006fffffc7353f in process_attach (node=<optimized out>, lpReserved=lpReserved@entry=0x0)
    at ../wine-git/dlls/ntdll/loader.c:1802
#22 0x00006fffffc73415 in process_attach (lpReserved=0x0, node=<optimized out>) at ../wine-git/dlls/ntdll/loader.c:1771
#23 walk_node_dependencies (callback=<optimized out>, context=0x0, node=0x2424f0) at ../wine-git/dlls/ntdll/loader.c:942
#24 process_attach (node=0x2424f0, lpReserved=lpReserved@entry=0x0) at ../wine-git/dlls/ntdll/loader.c:1789
#25 0x00006fffffc76a28 in process_attach (lpReserved=0x0, node=<optimized out>) at ../wine-git/dlls/ntdll/loader.c:1771
#26 LdrLoadDll (path_name=<optimized out>, flags=0, libname=<optimized out>, hModule=0x21fe38)
    at ../wine-git/dlls/ntdll/loader.c:3517
#27 0x00006fffff465e22 in load_library (libname=libname@entry=0x21fe90, flags=flags@entry=0)
    at ../wine-git/dlls/kernelbase/loader.c:171
#28 0x00006fffff466aa2 in LoadLibraryExW (name=0x7ffc1268, file=<optimized out>, flags=0)
    at ../wine-git/dlls/kernelbase/loader.c:556
#29 0x00000001400015df in _gnu_exception_handler (exception_data=0x8)
    at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/crt_handler.c:249
#30 0x0000021e00000008 in ?? ()
#31 0x0000000000ba3bf0 in ?? ()
#32 0x00000001400012ee in __write_memory (len=2, src=<synthetic pointer>, addr=0xba3bf8)
    at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/pseudo-reloc.c:294
#33 __write_memory (len=2, src=<synthetic pointer>, addr=0xba3bf8)
    at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/pseudo-reloc.c:263
#34 do_pseudo_reloc (start=<optimized out>, end=<optimized out>, base=<optimized out>)
    at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/pseudo-reloc.c:469
#35 _pei386_runtime_relocator () at /build/mingw-w64-crt/src/mingw-w64-v11.0.0/mingw-w64-crt/crt/pseudo-reloc.c:501
#36 0x0000000000000000 in ?? ()
lhmouse commented 1 month ago

That's right. And if libgcc is linked statically then there will be a similar issue. And yet more about std::cout etc..

I am not saying that we should not do anything about it, but it doesn't cause any issues in practice. If you are building a DLL, then almost every dependency should be linked as a shared library.

Saancreed commented 1 month ago

Actually, as it turns out the issue is not exclusive to static linking. I dropped -static from DLL build command and added libstdc++-6.dll, libgcc_s_seh-1.dll and libwinpthread-1.dll to my PATH and I can observe the same issue. Static linking just makes this easier to catch/observe but if all these libraries are loaded-then-unloaded even in their dynamic link form, TLS indexes will still leak.

I am not saying that we should not do anything about it, but it doesn't cause any issues in practice. If you are building a DLL, then almost every dependency should be linked as a shared library.

It does, we are building dxvk-nvapi using mingw-w64 toolchain and it's causing some games in Proton to abort after they load it, try to initialize (which fails) then unload it a few hundred times. On the other hand, we could not observe this issue on Windows with native NVAPI (which presumably NVIDIA is linking against MSVC runtime) which it can be loaded and unloaded millions of times.

And considering the above, switching to dynamic linking most likely won't help us.

lhmouse commented 1 month ago

Actually, as it turns out the issue is not exclusive to static linking. I dropped -static from DLL build command and added libstdc++-6.dll, libgcc_s_seh-1.dll and libwinpthread-1.dll to my PATH and I can observe the same issue. Static linking just makes this easier to catch/observe but if all these libraries are loaded-then-unloaded even in their dynamic link form, TLS indexes will still leak.

Let me guess. Is it the case that the main executable is not linked against winpthreads, so if a dynamic library is loaded, it loads and unloads winpthreads indirectly?

It does, we are building dxvk-nvapi using mingw-w64 toolchain and it's causing some games in Proton to abort after they load it, try to initialize (which fails) then unload it a few hundred times. On the other hand, we could not observe this issue on Windows with native NVAPI (which presumably NVIDIA is linking against MSVC runtime) which it can be loaded and unloaded millions of times.

And considering the above, switching to dynamic linking most likely won't help us.

A workaround for the above would be to prevent the winpthread DLL from being unloaded. So in one of your addon, you can try this:

HMODULE dummy;
GetModuleHandleExW(
  GET_MODULE_HANDLE_EX_FLAG_PIN
    | GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
  (LPCWSTR)(INT_PTR) pthread_create,
  &dummy);
Saancreed commented 1 month ago

Let me guess. Is it the case that the main executable is not linked against winpthreads, so if a dynamic library is loaded, it loads and unloads winpthreads indirectly?

Correct.

A workaround for the above would be to prevent the winpthread DLL from being unloaded.

Sure, and that also applies to static linking scenario if I pinned my own library but it's not really a solution; it's only, as you say, a workaround, not to mention that calling GetModuleHandleEx from DllMain(DLL_PROCESS_ATTACH) (which is probably the only opportunity I have to do anything in cases like the sample above) is probably illegal. I believe it would be better for consumers of winpthreads library if it freed the TLS index it allocated on unload, or at least if I had the ability to do this myself in DllMain(DLL_PROCESS_DETACH) manually.

Or, if winpthreads is not supposed to even support unloading, then why doesn't it already always pin itself?

lhmouse commented 1 month ago

Or, if winpthreads is not supposed to even support unloading, then why doesn't it already always pin itself?

I can propose such a patch. It's probably insufficient, though. As mentioned above winpthreads is not the only library that is affected.

Saancreed commented 1 month ago

I can propose such a patch. It's probably insufficient, though. As mentioned above winpthreads is not the only library that is affected.

Yes, that is unfortunate. But to be fair, leaking an entire library to hide a TLS index leak is not ideal either. And ultimately, leaks in other libraries should probably be addressed somehow as well. I can see how it's much less of a problem for libgcc and libstdc++ on Linux which are shipped by distros as systemwide libraries, are pretty much always dynamically linked to and often by more than one shared library if not the executable itself. But it looks like winpthreads' TLS index is the first resource to we're running out of so let's focus on that one first.

Let me try to give this a bit more context:

The primary use for my and some other libraries I know is providing alternative implementation of some system libraries that are usually present on Windows. I co-maintain a reimplementation of Nvidia's NVAPI library but there is also reimplementation of Direct3D 8 to 11 and there are probably a few others as well, and we are using Mingw toolchain to build those Windows libraries on Linux systems, mostly for use in Wine (a compatibility layer for running Windows applications on Unix systems).

One quite important feature for us is the ability to produce artifacts that are simple drop-in replacements for existing libraries. For example, when you release your own dxgi.dll and d3d11.dll you don't want your installation instructions to be "install dxgi.dll and d3d11.dll, then install libstdc++-6.dll, libgcc_s_seh-1.dll and libwinpthreads-1.dll unless they already exist, in which case have fun resolving conflicts and figuring out which one to keep" because libwinpthreads-1.dll is not a system component and we cannot rely on it being already available, or expect that no program that could use our replacement libraries will ever ship its own. For this reason, I'd expect most libraries to choose to link libstdc++, libgcc and libwinpthreads statically and keep them as private implementation details, which is what we are currently doing to improve everyone's experience (not to mention that dynamic linking wouldn't solve the problem in cases where we're the only library that depends on libwinpthreads-1.dll so it would be unloaded too).

But because we are targeting environments where we are very often the only library in the process using Mingw runtime (while the executable and other libraries are linked to Visual C/C++ runtime and we have no control over them) it makes us suspectible to this bug, ultimately breaking naughty games and other programs that were compiled with MSVC and expect that they can load-and-unload some library dozens of times per second… which, as it happens, they are successfully doing on Windows with original one that is presumably linked to vcruntime, which means that we cannot safely replace it. In Proton, a project that ships a combination of Wine, DXVK and dxvk-nvapi for use with Steam games, we applied a hack to some known problematic applications that prevents them from loading NVAPI library at all. DXVK survives only by luck because dxgi.dll is not a library that's reloaded multiple times in an application's lifetime, but it's still a ticking bomb, so to speak.

The fact that winpthreads leaks such a limited resource and there is currently no way to prevent it makes me doubt the viability of Mingw as a toolchain for delivering Windows shared libraries, or at least those that use winpthreads, even indirectly. You can probably imagine that it's quite a big limitation that library authors using Mingw might not even be aware of. Can we please consider adding some kind of safeguard mechanism that will automagiczlly free the slot on library unload, ideally one that works with both shared and static builds of winpthreads? Perhaps something like this (if you excuse the naivety because it probably won't be that simple):

void __attribute__((destructor)) pthread_tls_destroy(void)
{
  if (_pthread_tls != TLS_OUT_OF_INDEXES)
    TlsFree(_pthread_tls);
}
lhmouse commented 1 month ago

The fact that winpthreads leaks such a limited resource and there is currently no way to prevent it makes me doubt the viability of Mingw as a toolchain for delivering Windows shared libraries, or at least those that use winpthreads, even indirectly.

Yes I totally agree with that.

Perhaps something like this (if you excuse the naivety because it probably won't be that simple):

void __attribute__((destructor)) pthread_tls_destroy(void)
{
  if (_pthread_tls != TLS_OUT_OF_INDEXES)
    TlsFree(_pthread_tls);
}

This is subject to order of destruction if winpthreads is linked statically. I think this had better be done in free_pthread_mem(). Image TLS callbacks are invoked from CRT$XLB to CRT$XLZ, and the winpthreads one is registered in CRT$XLF so it should happen after almost everything.

lhmouse commented 1 month ago

Proposed patch: https://sourceforge.net/p/mingw-w64/mailman/message/58799907/

For the time being, you can also try https://gcc-mcf.lhmouse.com/. This implementation stores the global TLS index in a block of named shared memory, which is never deallocated; but it has a name that is private and unique to each process, so whenever the DLL is loaded again, it can always regain a pointer to the exact same block of memory, without allocating a new one.

Saancreed commented 1 month ago

I applied your patch on top of 12.0.0 release and it appears to have resolved TLS index leak both in my test case and in the library I maintain. Thank you!

Now that they survive more than ~1000 reloads, it's easy to observe a memory leak that's likely happening due to libstdc++ and/or libgcc not cleaning up their allocations which you pointed out earlier, where 100000 reloads leak about 600 MiB of memory in Wine. But I suppose that's a problem for another day and I should report that to the GNU project.

Saancreed commented 1 month ago

Although, hmm, I changed my Wine version to one without debug symbols and apparently now I'm encountering an abort() during the call to FreeLibrary() which seems to be coming from here: https://github.com/mingw-w64/mingw-w64/blob/v12.0.0/mingw-w64-libraries/winpthreads/src/thread.c#L1051 judging from calls to CreateEvent, DuplicateHandle and GetThreadPriority I see in the logs just before exiting.

Perhaps this isn't as safe as we were hoping it would be.