Open mstorsjo opened 3 months ago
IIUC this is basically a problem with the platform itself and the fact that DLLs don't exhibit the right behavior w.r.t. link-time replacement of functions, is that right?
If so, I am wary of trying too hard to work around the fundamental limitations of DLLs (option 4) since that feels like just applying a band-aid within the library, but users don't actually get the right behavior if they override functions in a similar way themselves.
Instead, wouldn't we better off acknowledging that this doesn't work with DLLs, adjusting the test suite accordingly and ideally issuing a diagnostic and rejecting any user trying to do this?
IIUC this is basically a problem with the platform itself and the fact that DLLs don't exhibit the right behavior w.r.t. link-time replacement of functions, is that right?
Yes, pretty much.
If so, I am wary of trying too hard to work around the fundamental limitations of DLLs (option 4) since that feels like just applying a band-aid within the library, but users don't actually get the right behavior if they override functions in a similar way themselves.
No, not quite.
Option 4 would properly fix the issue (as far as practically possible on the platform) - users can override operator delete(void*)
in their code, they can call/emit calls to operator delete(void*, size_t)
and have that fall back to their own overridden code - within their own executable. The fix wouldn't be specific to libc++ but would fix the same issue for all users of libc++ that try to override operators.
It would also fix the preexisting issue of fallback between e.g. throw/nothrow and aligned/non-aligned operator new.
So it fixes the surprise that -fsized-deallocation
brings, and takes libc++ to the same level of functionality as MS's own STL, which is the bar for what functionality you can have on this platform.
The only case it won't fix, is that the user's overridden operators won't be visible in other DLLs in the process. (And MS STL doesn't solve that case either.)
Instead, wouldn't we better off acknowledging that this doesn't work with DLLs, adjusting the test suite accordingly and ideally issuing a diagnostic and rejecting any user trying to do this?
Not really sure how we could emit a diagnostic for it, practically. And while some things are impossible to do correctly with DLLs, and there are preexisting cases of operator overloading that doesn't work as expected, this change (defaulting to sized deallocation) makes yet another pretty large regression in what works and what doesn't.
So after having thought about this for a few hours since I posted the options above...
I think that option 4 is the best and most correct, on a technical level. - It's also hugely Windows specific in how it works, so it would need to sit behind a bunch of conditionals in the cmake file, and I'm not sure how interested the rest of libc++ is in having such a kludge in there. There are a few technical details around the implementation of it (in my proof of concept so far), that maybe could be fixed better as well.
However practically, I don't think option 4 is ready to be merged before 19.x, and I wouldn't want to rush such a change.
So pragmatically, I would go with option 1 - with the plan that option 4 would be the goal, at some point, when the implementation is acceptable, and we'd punt on enabling -fsized-deallocation
for this target until then.
I haven't quite read the comments thoroughly, but if option 4 is the same way as how the MSVC STL deals with the issue, I would lean towards it. I agree there isn't nearly enough time to get this solution into 19.x, so I would also vote for option 1 - disabling -fsized-deallocation
on MinGW for the time being.
I agree that option 4 is best long term and 1 seems okay short term.
For the compatibility, AFAIK, the only way to make merging work with binutils ar are MRI scripts. llvm-lib (and MSVC lib.exe) does what we need here by default.
For the compatibility, AFAIK, the only way to make merging work with binutils ar are MRI scripts. llvm-lib (and MSVC lib.exe) does what we need here by default.
Hmm - here I do the merging with the llvm-ar option L
- but do you mean that lib.exe and llvm-lib also do that operation, if you just add an existing static archive to another one (instead of adding the static archive as one member)?
FWIW, one thing I'd still like to improve in the PoC for option 4, is that currently, the import lib first contains the symbols for operator new
and delete
, as they are exported from the DLL (and we still want them to be exported - older executables link against them), but we also add those symbols as regular objects into the merged library, so we have multiple archive members providing the same symbol. This feels borderline underspecified - or I guess it might be well defined, that the linker picks whichever member comes first?
So ideally, I'd like to omit that particular symbol from the import library, but keep it exported. So either an operation on the import library archive to remove those (short import) files before merging, or to have the linking operation produce a def file, remove those symbols from the def file and regenerate an import library with e.g. llvm-dlltool (or llvm-lib). The former seems more straightforward, but I'm not entirely sure how, while the latter seems quite roundabout.
@mstorsjo Ack, thanks for all the information here.
I think I am rallying behind the idea that 4 is the best long term, and I guess 1 is the only viable option in the short term.
Maintainability-wise, I believe that refactorinh libcxxabi into libcxx like we’ve wanted to do for a long time might be a helpful first step since it would clean up the build system a lot. I might have time to put together a RFC for that in the coming weeks.
Maintainability-wise, I believe that refactorinh libcxxabi into libcxx like we’ve wanted to do for a long time might be a helpful first step since it would clean up the build system a lot. I might have time to put together a RFC for that in the coming weeks.
FWIW, that doesn't make much of a difference either way for this fix, but I don't mind such a change - for the mingw builds we essentially treat libcxxabi as an integrated part of libcxx anyway.
I agree with the others. Since the work on option 4 is already advanced, it'll be probably best to buy it some time by going option 1 for the upcoming release. It's not clear to me how beneficial sized deallocation is on Windows, but given MS is already doing it, they probably found it worthwhile, even with DLL shenanigans.
iirc GNU ar doesn't implement those options
This is strictly about libc++ building, right? I wouldn't worry about libc++ compilation too much, at MSYS2 we have stopped trying to build libc++ with GNU tools and use LLVM toolset for that task because the issues were too severe. It's hard to tell how many users care about using libc++ with Binutils, but I wouldn't consider it very popular anyway.
It's not clear to me how beneficial sized deallocation is on Windows, but given MS is already doing it, they probably found it worthwhile, even with DLL shenanigans.
I'm not sure that the sized deallocation itself is beneficial in any way; the way I understand it is just that C++14 says that when you invoke delete
it is supposed to invoke this new sized delete
operator; this affects things like which of the user's provided operators that gets called.
iirc GNU ar doesn't implement those options
This is strictly about libc++ building, right?
Yes, that's true.
I wouldn't worry about libc++ compilation too much, at MSYS2 we have stopped trying to build libc++ with GNU tools and use LLVM toolset for that task because the issues were too severe. It's hard to tell how many users care about using libc++ with Binutils, but I wouldn't consider it very popular anyway.
Yeah it's probably not an important case. But we probably should either skip doing that library merging in that case, or error out cleanly or something.
I made a PR for option 1, reverting to the older behaviour for the MinGW target now, in https://github.com/llvm/llvm-project/pull/97232.
@mstorsjo Any development for a long term fix here?
@mstorsjo Any development for a long term fix here?
No specific progress lately. I did make a quite decent PoC earlier, but I haven’t picked it up to finish it for review yet.
Since 130e93cc26ca9d3ac50ec5a92e3109577ca2e702, included in the upcoming 19.x branch, Clang now enables
-fsized-deallocation
by default when targeting C++14 or newer.Running the libc++ tests with such a clang binary ends up with 27 tests failing, see e.g. https://github.com/mstorsjo/llvm-mingw/actions/runs/9414695194/job/25934131202. (This is in build configurations with libc++ built as a DLL, it's not a problem when libc++ is linked statically.)
The effect of this change, is that when invoking
delete
, it generates a call tooperator delete(void*, size_t)
instead of a call tooperator delete(void*)
.By default, both
operator delete(void*)
andoperator delete(void*, size_t)
are provided by libcx++/libc++abi. If the user overridesoperator delete(void*)
, but the compiler callsoperator delete(void*, size_t)
, it should hit the libc++/libc++abi fallback implementation which callsoperator delete(void*)
.When everything is linked statically, the fallback
operator delete(void*, size_t)
ends up calling the user provided, overriddenoperator delete(void*)
. However, if the fallbackoperator delete(void*, size_t)
resides in libc++.dll, a user providedoperator delete(void*)
won't get called, but it just calls the defaultoperator delete(void*)
.Thus, this change breaks the case when users override only
operator delete(void*)
, notoperator delete(void*, size_t)
(which the user may not even have heard about), as the compiler generated code no longer ends up calling the user's overridden operator.This isn't an entirely new issue; the same issue with operator overloading already exists in a number of cases, e.g. where the user overrides
operator new(size_t)
but the caller callsoperator new(size_t, nothrow_t)
. We have covered these cases in libc++ tests by omitting some individual checks in these cases usingASSERT_WITH_OPERATOR_NEW_FALLBACKS
, see https://github.com/llvm/llvm-project/blob/llvmorg-19-init/libcxx/test/support/test_macros.h#L353-L371There is also another similar situation; if the user overrides operator new/delete, any calls to these functions within libc++.dll will still call the default functions within the DLL, instead of the functions provided by the user code (in another EXE/DLL) - these are waived in tests with
ASSERT_WITH_LIBRARY_INTERNAL_ALLOCATIONS
in the same file above.For the fallbacks between different kinds of operators, this situation actually works correctly with MS STL, when linked as a DLL. This is because the import library to the MS STL DLL,
msvcprt.lib
, isn't a regular pure import library, but it contains a handful of regular object files (which end up statically linked into the user executables, if needed) that provide the fallbackoperator new
/delete
, so that fallbacks from a more specialized operator to a less specialized one do work in user code.CC people involved in MinGW/libc++, @lazka @mati865 @alvinhochun @cjacek @jeremyd2019 - and from the libc++ side @ldionne
With around 4 weeks to go to the 19.x branch date, what direction should we take wrt this for the MinGW target?
I see two aspects of this; we want the libc++ testsuite to still work. But user code, that previously just did override
operator delete(void*)
but didn't overrideoperator delete(void*, size_t)
, no longer works as it did, when using libc++ as a DLL. That's possibly a surprising change (even though the amount of projects overriding operators probably is quite small).I see the following options:
-fno-sized-deallocation
for MinGW targets, just like before. The linked change above does this when targeting zOS already, and when targeting older versions of Darwin. See https://github.com/mstorsjo/llvm-project/commit/clang-no-sized-dealloc-mingw for an example of this.-fno-sized-deallocation
to the MinGW libc++ testsuite configs. This silences the issue in the libc++ tests, but the potential issue for users still exists. See https://github.com/mstorsjo/llvm-project/commit/libcxx-mingw-no-sized-deallocASSERT_WITH_OPERATOR_NEW_FALLBACKS
, see https://github.com/mstorsjo/llvm-project/commit/libcxx-waive-sized-operator-delete-fallback. The issue remains for users here as well.llvm-ar
specifically for merging static libraries, iirc GNU ar doesn't implement those options.) Technically, this is the best solution, as it makes the operator overloading work as expected - it also would allow getting rid of the existingASSERT_WITH_OPERATOR_NEW_FALLBACKS
check waivers for MinGW/DLL configs. But implementation wise it's quite messy.With a couple weeks to go to the branch, which way do we go?