Open ManifoldFR opened 4 months ago
Hi, thanks for a very detailed report!
There's two different things happening here. One is that the symbol isn't exported if Magnum is built as static, which is how it should be (no other "usual" symbols are exported from static builds either). So if you have a static build of Magnum linked into a shared library, the symbols are reachable just from the static library itself and not from outside, i.e. from another shared lib or an executable. Using Magnum from there means linking to its static libraries again, which is what you did (and if you wouldn't, the build would end with a linker error). The consequence of linking the static library to two shared libraries is that its code gets duplicated in both shared libraries, which is wasteful from a code size perspective, but other than that it's a valid usage. For the most part.
The other thing is that, unfortunately, even after I tried very hard to avoid mutable global state, there's still some left, in this particular case for the current GL context and the GL entrypoints. (Another mutable global state is debug/error output redirection or static plugin management in the plugin manager, the in-progress Vulkan wrapper OTOH doesn't rely on any global state.) Apart from this global state needing to be thread-local to avoid races, the main problem is that the globals get duplicated when you link the same static library to two different shared libraries, like you do in your case. So one shared library sees a global that's different from what the other sees, thus if a GL::Context
setup populates the flextGL
entrypoints in one shared library, the other shared library still has its own copy that's all untouched null pointers and :boom:
There's various platform-specific ways to solve this, what the other globals rely on is __attribute__((weak))
, which in all my testing ensured that just one copy of the global gets picked for all shared libraries. That's just Unix-like systems tho, on Windows this is worked around with having to call GetProcAddress()
at runtime and hoping that the call picks the same symbol every time (with some more hacks piled on top because on Windows you can have a thing either thread-local or exported but not both, and GetProcAddress()
needs to explicitly get the name of the DLL it looks into, there's no concept of "look just anywhere" like with dlsym()
), then this is further complicated for example in case of Python bindings where I had to modify the Python's internal library loading flags in order to make them share the symbols instead of isolating every loaded module into its own namespace. Which then caused nasty conflicts with e.g. Pytorch, that started crashing in case something in the pile of code loaded into Python was built against a different libstdc++ than the rest. Etc., etc. So even with all those workarounds, it's still rather brittle, I don't trust it, and projects that use this kind of setup of static libraries linked into multiple shared libraries I'm trying to convince to switch to a fully shared build.
What didn't get this "global deduplication" treatment so far, and what #453 mentions in particular, is this flextGL
struct, and the reason is performance. Because, compared to debug output redirection or shared plugin manager state which is accessed relatively seldom, the application may call into GL thousand times a frame, and the extra indirection caused by these workarounds, thread-local state + GetProcAddress()
in particular, would affect the performance quite a lot I fear. So I didn't do that so far and was waiting for a better idea to arrive. And I'm still waiting.
But maybe being a bit slow is still better than causing nasty segfaults, heh. I'll think about this.
That it did work for you with just exporting the symbol is just a coincidence, I fear. With a slightly different linking order or a more complex library setup it can still result in a different global being picked for every shared lib, so there really needs to be something explicitly for ensuring the deduplication. Any other ideas? :)
Wow, thanks for this very detailed and in-depth answer! I learned a lot of things.
I thought I had the visibility thing figured out but definitely did not think about the overhead when sharing global state. So in the end the lesson is that linking static to shared might be bad practice because downstream dependents could blow up?
I guess I will switch to a build with shared libraries.
I think one solution to your use case would be a static build of Magnum with all symbols exported, that gets linked only to one of your shared libraries, and the rest of the project / other shared libraries would then link to that shared library, not to (static) Magnum. Because linking to static Magnum would cause the duplicate globals issue to appear again.
This could be doable if I would provide a way to supply a custom value of CORRADE_VISIBILITY_STATIC
from the command line. But that is also rather brittle I fear, you'd have to make the rest of your buildsystem think that your shared library is actually Magnum, which means not being able to use any of the CMake support that Magnum provides.
Generally, linking static libraries to shared ones is a valid use case for example if you have a middleware product that ships as a shared library, and you don't want its internals to be known, or used from outside. Then, if a user of given shared library would want to use Magnum on their end as well, they'd have to build its own and link to it again. The two instances (assuming [CORRADE,MAGNUM]_BUILD_STATIC_UNIQUE_GLOBALS
is turned off in the builds) would then be completely independent of each other, having no shared state. This was the case for one project I worked on in the past, where we shipped a shared library that used Magnum for GPU data processing in a thread (and it wasn't desirable for third parties to peek into it), but also shipped a sample application that used Magnum for the GUI.
But besides this one use case, if it's desired to have Magnum used across shared libraries, the best is to either have it build as shared as well, or as static, but limit your code to just one shared library / just one executable. Lot less headaches that way :)
In any case, I'll attempt to fix the flextGL
global in some way, eventually. One project that uses the static+shared build currently works around that in a particularly nasty way, by calling flextGLInit()
directly from "the other" shared library, but that's nothing I would recommend either.
Generally, linking static libraries to shared ones is a valid use case for example if you have a middleware product that ships as a shared library, and you don't want its internals to be known, or used from outside. Then, if a user of given shared library would want to use Magnum on their end as well, they'd have to build its own and link to it again. The two instances (assuming
[CORRADE,MAGNUM]_BUILD_STATIC_UNIQUE_GLOBALS
is turned off in the builds) would then be completely independent of each other, having no shared state. This was the case for one project I worked on in the past, where we shipped a shared library that used Magnum for GPU data processing in a thread (and it wasn't desirable for third parties to peek into it), but also shipped a sample application that used Magnum for the GUI.But besides this one use case, if it's desired to have Magnum used across shared libraries, the best is to either have it build as shared as well, or as static, but limit your code to just one shared library / just one executable. Lot less headaches that way :)
I think I'll go with the shared library option, because we'd want others to be able to look at the internals and extend functionality so it makes sense
Cheers, thanks for answering my questions and enlightening me :)
Hi!
In the following line in flextGL.h, the
flextGL
global variable has the attributeFLEXTGL_EXPORT
which is the emptyCORRADE_VISIBILITY_STATIC
when the build is static (hence the symbol is hidden, following the-fvisibility
compile flag): https://github.com/mosra/magnum/blob/b1419017650c83538d8fe4681de6f0bca524cf49/src/MagnumExternal/OpenGL/GL/flextGL.h#L2629But since it's a global and we could make a shared library against a static build of Magnum, shouldn't we check for
MAGNUM_BUILD_STATIC_UNIQUE_GLOBALS
just like inGL/Context.cpp
? https://github.com/mosra/magnum/blob/b1419017650c83538d8fe4681de6f0bca524cf49/src/Magnum/GL/Context.cpp#L672-L684I tried the following change where the global's visibility is set to default when we require unique globals:
Or maybe the
FLEXTGL_EXPORT
macro should be changed? For me, the above fixed a segmentation fault issue I was having when instantiating a custom shader defined in my shared library (linked against a static build of Magnum, working on Ubuntu with clang). I saw that there's a mention of FlextGL symbols in #453.