Open Saancreed opened 4 months ago
That's right. And if libgcc is linked statically then there will be a similar issue. And yet more about std::cout
etc..
I am not saying that we should not do anything about it, but it doesn't cause any issues in practice. If you are building a DLL, then almost every dependency should be linked as a shared library.
Actually, as it turns out the issue is not exclusive to static linking. I dropped -static
from DLL build command and added libstdc++-6.dll
, libgcc_s_seh-1.dll
and libwinpthread-1.dll
to my PATH and I can observe the same issue. Static linking just makes this easier to catch/observe but if all these libraries are loaded-then-unloaded even in their dynamic link form, TLS indexes will still leak.
I am not saying that we should not do anything about it, but it doesn't cause any issues in practice. If you are building a DLL, then almost every dependency should be linked as a shared library.
It does, we are building dxvk-nvapi using mingw-w64 toolchain and it's causing some games in Proton to abort after they load it, try to initialize (which fails) then unload it a few hundred times. On the other hand, we could not observe this issue on Windows with native NVAPI (which presumably NVIDIA is linking against MSVC runtime) which it can be loaded and unloaded millions of times.
And considering the above, switching to dynamic linking most likely won't help us.
Actually, as it turns out the issue is not exclusive to static linking. I dropped
-static
from DLL build command and addedlibstdc++-6.dll
,libgcc_s_seh-1.dll
andlibwinpthread-1.dll
to my PATH and I can observe the same issue. Static linking just makes this easier to catch/observe but if all these libraries are loaded-then-unloaded even in their dynamic link form, TLS indexes will still leak.
Let me guess. Is it the case that the main executable is not linked against winpthreads, so if a dynamic library is loaded, it loads and unloads winpthreads indirectly?
It does, we are building dxvk-nvapi using mingw-w64 toolchain and it's causing some games in Proton to abort after they load it, try to initialize (which fails) then unload it a few hundred times. On the other hand, we could not observe this issue on Windows with native NVAPI (which presumably NVIDIA is linking against MSVC runtime) which it can be loaded and unloaded millions of times.
And considering the above, switching to dynamic linking most likely won't help us.
A workaround for the above would be to prevent the winpthread DLL from being unloaded. So in one of your addon, you can try this:
HMODULE dummy;
GetModuleHandleExW(
GET_MODULE_HANDLE_EX_FLAG_PIN
| GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
(LPCWSTR)(INT_PTR) pthread_create,
&dummy);
Let me guess. Is it the case that the main executable is not linked against winpthreads, so if a dynamic library is loaded, it loads and unloads winpthreads indirectly?
Correct.
A workaround for the above would be to prevent the winpthread DLL from being unloaded.
Sure, and that also applies to static linking scenario if I pinned my own library but it's not really a solution; it's only, as you say, a workaround, not to mention that calling GetModuleHandleEx
from DllMain(DLL_PROCESS_ATTACH)
(which is probably the only opportunity I have to do anything in cases like the sample above) is probably illegal. I believe it would be better for consumers of winpthreads library if it freed the TLS index it allocated on unload, or at least if I had the ability to do this myself in DllMain(DLL_PROCESS_DETACH)
manually.
Or, if winpthreads is not supposed to even support unloading, then why doesn't it already always pin itself?
Or, if winpthreads is not supposed to even support unloading, then why doesn't it already always pin itself?
I can propose such a patch. It's probably insufficient, though. As mentioned above winpthreads is not the only library that is affected.
I can propose such a patch. It's probably insufficient, though. As mentioned above winpthreads is not the only library that is affected.
Yes, that is unfortunate. But to be fair, leaking an entire library to hide a TLS index leak is not ideal either. And ultimately, leaks in other libraries should probably be addressed somehow as well. I can see how it's much less of a problem for libgcc and libstdc++ on Linux which are shipped by distros as systemwide libraries, are pretty much always dynamically linked to and often by more than one shared library if not the executable itself. But it looks like winpthreads' TLS index is the first resource to we're running out of so let's focus on that one first.
Let me try to give this a bit more context:
The primary use for my and some other libraries I know is providing alternative implementation of some system libraries that are usually present on Windows. I co-maintain a reimplementation of Nvidia's NVAPI library but there is also reimplementation of Direct3D 8 to 11 and there are probably a few others as well, and we are using Mingw toolchain to build those Windows libraries on Linux systems, mostly for use in Wine (a compatibility layer for running Windows applications on Unix systems).
One quite important feature for us is the ability to produce artifacts that are simple drop-in replacements for existing libraries. For example, when you release your own dxgi.dll
and d3d11.dll
you don't want your installation instructions to be "install dxgi.dll
and d3d11.dll
, then install libstdc++-6.dll
, libgcc_s_seh-1.dll
and libwinpthreads-1.dll
unless they already exist, in which case have fun resolving conflicts and figuring out which one to keep" because libwinpthreads-1.dll
is not a system component and we cannot rely on it being already available, or expect that no program that could use our replacement libraries will ever ship its own. For this reason, I'd expect most libraries to choose to link libstdc++, libgcc and libwinpthreads statically and keep them as private implementation details, which is what we are currently doing to improve everyone's experience (not to mention that dynamic linking wouldn't solve the problem in cases where we're the only library that depends on libwinpthreads-1.dll
so it would be unloaded too).
But because we are targeting environments where we are very often the only library in the process using Mingw runtime (while the executable and other libraries are linked to Visual C/C++ runtime and we have no control over them) it makes us suspectible to this bug, ultimately breaking naughty games and other programs that were compiled with MSVC and expect that they can load-and-unload some library dozens of times per second… which, as it happens, they are successfully doing on Windows with original one that is presumably linked to vcruntime, which means that we cannot safely replace it. In Proton, a project that ships a combination of Wine, DXVK and dxvk-nvapi for use with Steam games, we applied a hack to some known problematic applications that prevents them from loading NVAPI library at all. DXVK survives only by luck because dxgi.dll
is not a library that's reloaded multiple times in an application's lifetime, but it's still a ticking bomb, so to speak.
The fact that winpthreads leaks such a limited resource and there is currently no way to prevent it makes me doubt the viability of Mingw as a toolchain for delivering Windows shared libraries, or at least those that use winpthreads, even indirectly. You can probably imagine that it's quite a big limitation that library authors using Mingw might not even be aware of. Can we please consider adding some kind of safeguard mechanism that will automagiczlly free the slot on library unload, ideally one that works with both shared and static builds of winpthreads? Perhaps something like this (if you excuse the naivety because it probably won't be that simple):
void __attribute__((destructor)) pthread_tls_destroy(void)
{
if (_pthread_tls != TLS_OUT_OF_INDEXES)
TlsFree(_pthread_tls);
}
The fact that winpthreads leaks such a limited resource and there is currently no way to prevent it makes me doubt the viability of Mingw as a toolchain for delivering Windows shared libraries, or at least those that use winpthreads, even indirectly.
Yes I totally agree with that.
Perhaps something like this (if you excuse the naivety because it probably won't be that simple):
void __attribute__((destructor)) pthread_tls_destroy(void) { if (_pthread_tls != TLS_OUT_OF_INDEXES) TlsFree(_pthread_tls); }
This is subject to order of destruction if winpthreads is linked statically. I think this had better be done in free_pthread_mem()
. Image TLS callbacks are invoked from CRT$XLB
to CRT$XLZ
, and the winpthreads one is registered in CRT$XLF
so it should happen after almost everything.
Proposed patch: https://sourceforge.net/p/mingw-w64/mailman/message/58799907/
For the time being, you can also try https://gcc-mcf.lhmouse.com/. This implementation stores the global TLS index in a block of named shared memory, which is never deallocated; but it has a name that is private and unique to each process, so whenever the DLL is loaded again, it can always regain a pointer to the exact same block of memory, without allocating a new one.
I applied your patch on top of 12.0.0 release and it appears to have resolved TLS index leak both in my test case and in the library I maintain. Thank you!
Now that they survive more than ~1000 reloads, it's easy to observe a memory leak that's likely happening due to libstdc++ and/or libgcc not cleaning up their allocations which you pointed out earlier, where 100000 reloads leak about 600 MiB of memory in Wine. But I suppose that's a problem for another day and I should report that to the GNU project.
Although, hmm, I changed my Wine version to one without debug symbols and apparently now I'm encountering an abort()
during the call to FreeLibrary()
which seems to be coming from here: https://github.com/mingw-w64/mingw-w64/blob/v12.0.0/mingw-w64-libraries/winpthreads/src/thread.c#L1051 judging from calls to CreateEvent
, DuplicateHandle
and GetThreadPriority
I see in the logs just before exiting.
Perhaps this isn't as safe as we were hoping it would be.
Not sure is this is the proper place to report winpthreads issues so please let me know if there is any better issue tracker for this stuff.
When a library linked with static winpthreads is loaded using
LoadLibrary
, winpthreads will usually allocate a TLS index for the library to use. That index however appears to be never freed, so applications that repeatedly load and unload libraries statically linked to winpthreads will eventually run out of TLS indexes and abort after a few hundred iterations.The issue can be reproduced trivially with this sample library:
(Build with
x86_64-w64-mingw32-g++ -shared hello.cpp -o hello.dll -static
)And this test program:
(Build with
x86_64-w64-mingw32-gcc main.c -o main.exe
)Test program aborts at
i = 1086
on Windows 11 andi = 1085
in Wine 9.x, with the stack trace (from Wine) looking like so: