iains / gcc-12-branch

GCC 12 for Darwin with experimental Arm64 support. Current release 12.4-darwin-r0 [June 2024]
GNU General Public License v2.0
24 stars 7 forks source link

Unintended Consequences of Supporting -stdlib=libc++ #21

Open biergaizi opened 1 year ago

biergaizi commented 1 year ago

In the past, Darwin shipped an ancient C++ standard library by default, so most C++ programs needed to be compiled with the option stdlib=libc++ to use the newer library shipped on macOS. This option is only supported by clang on Darwin, and it's invalid for GCC. Thus, most ./configure scripts attempt to check whether -stdlib=libc++ is a valid build-time option using a dummy program - if it's the case, this flag is passed to the compiler. If it's not, it's omitted. Thus, when the program is compiled with (an up-to-date version of) GCC, it automatically links to native C++ runtime for g++. When the program is compiled with clang, it automatically links to Apple's new C++ library.

@item -stdlib=@var{libstdc++,libc++}
@opindex stdlib
When G++ is configured to support this option, it allows specification of
alternate C++ runtime libraries.  Two options are available: @var{libstdc++}
(the default, native C++ runtime for G++) and @var{libc++} which is the
C++ runtime installed on some operating systems (e.g. Darwin versions from
Darwin11 onwards).  The option switches G++ to use the headers from the
specified library and to emit @code{-lstdc++} or @code{-lc++} respectively,
when a C++ runtime is required for linking.

Unfortunately, since GCC Darwin 12.2, now it recognizes the option stdlib=libc++ and attempts linking to Darwin's C++ library instead of GCC's native C++ library. As a result, compilation of many previously successful programs now may fail if C++ if libc++'s header files are not present on the system. Since a dummy program without any #include statement is used to check the existence of -stdlib=libc++, it escapes detection but later fails in the middle of the build.

The minimum example is:

bash-3.2$ cat main.cpp 
#include <vector>

int main(void)
{
}

bash-3.2$ g++-12.1.0 main.cpp -o main
# OKAY

bash-3.2$ g++-12.1.0 main.cpp -o main -stdlib=libc++
g++-12.1.0: error: unrecognized command-line option '-stdlib=libc++'

bash-3.2$ g++-12.2.0 main.cpp -o main
# OKAY

bash-3.2$ g++-12.2.0 main.cpp -o main -stdlib=libc++
main.cpp:1:10: fatal error: vector: No such file or directory
    1 | #include <vector>
      |          ^~~~~~~~
compilation terminated.

In the past, because -stdlib=libc++ was unsupported, the build system would remove -stdlib=libc++ from the CFLAGS and successfully building the program. But now, -stdlib=libc++ is passed to GCC, the compilation of the same program now fails due to missing libc++ system headers.

Replicated on both Gentoo Prefix and Homebrew.

It's not strictly a bug, but it's noteworthy enough to be reported here.

biergaizi commented 1 year ago

To summarize, the old behaviors in a large number of build systems are:

But now, by supporting stdlib=libc++ in GCC, the behaviors are now:

biergaizi commented 1 year ago

For now, as a workaround, for Gentoo Prefix, we will use just --disable-stdlib-option to fallback to the old behaviors. For now, as a workaround, for Gentoo Prefix, we will use just --disable-stdlib-option to fallback to the old behaviors, which is at least well understood and don't create a new compatibility problem.

The old behavior is not perfect, of course, if clang and GCC are mixed when building software libraries, cross-linking libraries built by clang and built by GCC into the same binary can be a complete mess with a ton of conflicts. But the new behavior is equally a headache, since it broke the default assumption of many programs for deciding when to use libstdc++ or libc++. Both the old and new behaviors are equally problematic... Solution in unclear.

biergaizi commented 1 year ago

A related problem is that, at least without changing the built-time option, the default include paths in distributions would be broken:

on macOS 13, the system clang's default include path is, using Homebrew's package as an example:

clang -cc1 version 14.0.0 (clang-1400.0.29.202) default target x86_64-apple-darwin22.3.0
ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include"
ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1
 /Library/Developer/CommandLineTools/usr/lib/clang/14.0.0/include
 /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include
 /Library/Developer/CommandLineTools/usr/include
 /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks (framework directory)
End of search list.

Meanwhile for GCC, it's:

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/x86_64-apple-darwin22/12/../../../../../../include/c++/v1"
ignoring nonexistent directory "/usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/x86_64-apple-darwin22/12/../../../../../../x86_64-apple-darwin22/include"
ignoring nonexistent directory "/usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/../../../../lib/gcc/current/gcc/x86_64-apple-darwin22/12/../../../../../../include/c++/v1"
ignoring duplicate directory "/usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/../../../../lib/gcc/current/gcc/x86_64-apple-darwin22/12/include"
ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/local/include"
ignoring duplicate directory "/usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/../../../../lib/gcc/current/gcc/x86_64-apple-darwin22/12/include-fixed"
ignoring nonexistent directory "/usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/../../../../lib/gcc/current/gcc/x86_64-apple-darwin22/12/../../../../../../x86_64-apple-darwin22/include"
ignoring nonexistent directory "/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/x86_64-apple-darwin22/12/include
 /usr/local/Cellar/gcc/12.2.0/bin/../lib/gcc/current/gcc/x86_64-apple-darwin22/12/include-fixed
 /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/include
 /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks
End of search list.

The system C++ library path is not correctly passed to GCC as a built-time option. This is not a bug, but the downstreams need to handle that for supporting -stdlib=libc++ correctly.

biergaizi commented 1 year ago

I'll report it downstream at Homebrew.

iains commented 1 year ago

IFF one intends to configure GCC to support libc++ (which is actually a pretty good idea on modern Darwin) then you need to arrange to install or point the configuration to a suitable set of libc++ headers. There is no magic "it just works" to pick up those headers, since they are not part of GCC's sources (depending on the macOS version you might need to use older versions of libc++).

The libc++ headers need to be configured to be used with GCC (not clang) - since they have different capabilities.

Unfortunately, this is volunteer effort from me and $dayjob is very busy - but at some stage I hope to package up some suitable libc++ headers for GCC. [I did one set back in the gcc-9 era]

biergaizi commented 1 year ago

IFF one intends to configure GCC to support libc++ (which is actually a pretty good idea on modern Darwin) then you need to arrange to install or point the configuration to a suitable set of libc++ headers.

Yes, and I think this problem is the part that is the easiest of solve. It's "just" a matter of supplying the correct default path to GCC during built time.

However, the other problem, that is, the existing practice in many programs of relying on the presence or absence of--stdlib to link against Darwin's libc++ and GCC's libstdc++, is problematic. It used to be the case that most programs built on Darwin link to libstdc++ when GCC is used, and link to libc++ when clang is used. But now this assumption is broken, as GCC is supporting libc++ too.

The real problem for downstream distributions is:

  1. Enable stdlib= support, but change the ./configure scripts for every affected program to remove the behavior of automatically adding stdlib=c++, so they can use GCC's libstdc++ by default - which was the old behavior.
  2. Disable stdlib= support, so we don't have to change the ./configure scripts of hundreds of programs, but users lose the ability to use stdlib=libc++ when it's necessary.
  3. Enforce a new policy of linking everything to libc++ instead of libstdc++, even for GCC, this way we avoid the compatibility problem of mixing C++ libraries. This can be implemented by forcing libstd=libc++ as a default system C flag. But it breaks the old and reasonable assumption that GCC programs use GCC libraries.

Tricky question.

iains commented 1 year ago

yeah, abusing -stdlib as a means of detecting the compiler in use is a poor configuration script choice :(.

From my PoV we 're trying to make GCC more compatible with modern macOS as time goes on - and supporting the system libc++ is pretty essential to that.

biergaizi commented 1 year ago

Another question: what would happen if multiple stdlib= flags exist in the same command-line? Does the later option override the earlier option? If they can be safely used together, deciding which library to use can be the responsibility of the system package manager by setting global CXXFLAGS when a package is built.

iains commented 1 year ago

The system C++ library path is not correctly passed to GCC as a built-time option. This is not a bug, but the downstreams need to handle that for supporting -stdlib=libc++ correctly.

That is exactly the point - but it ought not to be a problem for the "downstream" - the default configuration for GCC will look for c++/v1 in the compilers include directory - so the compiler distributor could/should arrange for a suitable set of headers to be found.

iains commented 1 year ago

Another question: what would happen if multiple stdlib= flags exist in the same command-line? Does the later option override the earlier option? If they can be safely used together, deciding which library to use can be the responsibility of the system package manager by setting global CXXFLAGS when a package is built.

Almost all command line switches for GCC take the last value specified, but I'd always recommend checking that before relying on it in a given case.

iains commented 1 year ago

I am going to close this:

  1. it is not actually a bug, but an intentional improvement in support for macOS.
  2. there is a work-around (you can configure --disable-stdlib-option)

However, it would be a very good idea to get any upstream projects fixed that are using this (incorrectly) to decide if the compiler is GCC or clang. As time goes on, IMO it will be increasingly important to allow GCC to use the system /usr/lib/libc++.dylibespecially when code is mixed between GCC and clang.

biergaizi commented 1 year ago

there is a work-around (you can configure --disable-stdlib-option)

I have to disagree... I claimed that in the original report, but it eventually turned out to be a mistake when I was trying to patch Gentoo.

The so-called "option" --disable-stdlib-option does not actually exist. In the upstream GCC, the original script defines the macro ENABLE_STDLIB_OPTION based on whether gcc_gxx_libcxx_include_dir is passed as a the command-line option, otherwise it disables it. And in GCC Darwin, it also checks whether we're targeting Darwin, if so, the macro ENABLE_STDLIB_OPTION is forced on. There is no any option to disable it, except for patching configure.ac, which was what I did.

if test x${gcc_gxx_libcxx_include_dir} != x; then
  AC_DEFINE(ENABLE_STDLIB_OPTION, 1,
            [Define if the -stdlib= option should be enabled.])
else
  case $target in
    *-darwin1[[1-9]]* | *-darwin2*)
       # Default this on for Darwin versions which default to libcxx.
       AC_DEFINE(ENABLE_STDLIB_OPTION, 1)
       ;;
    *)
       AC_DEFINE(ENABLE_STDLIB_OPTION, 0)
       ;;
  esac
fi

Please consider an option for explicitly disabling stdlib for systems that need compatibility before other packages are fixed.

iains commented 1 year ago

there is a work-around (you can configure --disable-stdlib-option)

Please consider an option for explicitly disabling stdlib for systems that need compatibility before other packages are fixed.

Yes, sure - the intention would be to have some way to opt out of most things (in this case, a configuration test that tried to use the option would fail - but one that simply checks for its existence will not).

I will take a look at this [we need to do a 13.1r1 anyway]

iains commented 1 year ago

this is what I am testing [on all open branches for the sake of preserving consistency] (I'm OK with making the special path value 'none' rather than 'no' but let's not delay things too long to decide)


configure, Darwin: Adjust handing of stdlib option.

The intent of the configuration choices for -stdlib is that default setting should choose reasonable options for the target. This should enable -stdlib= for Darwin targets where libc++ is the default on the system (so that it is only necessary to provide the headers).

However, it seems that there are some cases where (external) config scripts are using -stdlib (incorrectly) to determine if the compiler in use is GCC or clang.

In order to allow for these cases, this patch refines the setting like so:

--with-gxx-libcxx-include-dir= is used to configure the path containing libc++ headers; it also controls the enabling of the -stdlib option.

We are adding a special value for path: if --with-gxx-libcxx-include-dir is 'no' we disable the stdlib option.

Otherwise if the --with-gxx-libcxx-include-dir is set we use the path provided, and enable the stdlib option.

if --with-gxx-libcxx-include-dir is unset We decide on the stdlib option based on the OS type and revision being targeted. The path is set to a fixed position relative to the compiler install (similar logic to that used for libstdc++ headers).

edit: so that the default case (--with-gxx-libcxx-include-dir is unset) should produce the 'correct' behaviour for the defaults, absent the issue mentioned.

iains commented 1 year ago

To be repetitive: the scripts using support for -stdlib to determine the compiler in use really need fixing, since I think that there are projects that we want to be able to build with GCC that need libc++ support.

I built around 120 core (from the toolchain perspective) OSS projects using gcc-7.5 + stdlib=libc++ and the libc++ headers I modified from LLVM 9(I think). This worked well - and I think would be even more important when mixing GCC and clang code - having two different (but quite similar) C++ runtimes bound into one executable seems likely to be asking for trouble :)

biergaizi commented 1 year ago

To be repetitive: the scripts using support for -stdlib to determine the compiler in use really need fixing, since I think that there are projects that we want to be able to build with GCC that need libc++ support.

Technically speaking, they do not determine the compiler type, but to determine which libc they are going to link. If stdlib=libc++ is supported, it means some programs may suddenly start to link to clang libc under GCC, a configuration that was previously untested by both the software maintainer and distro maintainer (GCC's libc++ support is currently broken in both Gentoo and MacPorts due to missing headers).

But as clang's libc++ has become Darwin's default and this additional flag is now obsolete, I expect fewer problems in the future.

iains commented 1 year ago

so, in practice, if we install a GCC-compatible set of libc++ headers, it should all "just work" (my experience [so far] has been that GCC + libc++ headers works fine).

My open questions on that are (1) how many different header sets do we need to cover the OS version range and (2) how to deliver them - since i do not think that we are going to import libc++ into GCC any time soon, there will need to be separate step(s).

iains commented 1 year ago

although it is default to use libc++ from 10.8+ we obviously do not do that yet (because of needing to install the headers) - however, if it just happens that a package is assuming that it will be using libc++ (but will be really using libstdc++ with GCC, that could also cause us subtle [or not so subtle] issues)

iains commented 1 year ago

For the record, we cannot just link to the Xcode libc++ headers, because they were built against clang's internal headers and we then get mismatches in definitions of some entities (because GCC's stddef.h is different from clang's for example). There are also other issues to do with when headers are in experimental/ in clang and not in GCC (or vice versa).

biergaizi commented 1 year ago

Suppose we ship GCC with libc++ headers or default to libc++ in GCC in the future, it can still create its own surprises at downstream. In the past, people expect GCC-built programs to use GCC's infrastructure, including libstdc++, and I believe most people still want things to stay this way.

biergaizi commented 1 year ago

When a build script always adds stdlib=libc++ when it's possible to do so, the situation would be pretty messy during the transition...

iains commented 1 year ago

yeah, there's a tension between "keeping the devil you know" and fixing things to work as they are supposed to.


We have some turbulence ahead ; since we want to implement __has_feature/__has_extension ((Patches posted) which will unleash a whole new set of header parses) .. and then we want to implement the availability attribute (I have patches to post) .. and then support for block closures (a.k.a. "Apple Blocks"), likewise I have prototype patches.

Maintaining the status quo (as of now) will mean that GCC is going to become unusable - we are already seeing some cases where there are no alternate APIs to the blocks ones... for some stuff [which is why libsanitizer is now disabled for Ventura+].

We need to recognise and deal with this - I am not sure that we can plan exactly - since this is still all voluntary at present so the timescales are unknown.

iains commented 1 year ago

one additional note; we do not actually default GCC to use libc++ in any proposed or released patch. The only effect of this patch is to enable the -stdlib option (so that one does not need to rebuild to add the support). What was unexpected was that packages would test the option only (and not that a program could be built successfully with it).

Anyway, the revision in test provides a mechanism to back out of this.

We do need to find a process to introduce things that make GCC a better compiler on macOS - and find ways to iron out the inevitable wrinkles. (std)libc++ is a particularly knotty problem, since the dyld-shared-cache includes the mentioned 'ancient' libstdc++.

I am definitely open to suggestions on how you folks (i.e. my downstream) can help make sure that new work to make GCC more compatible can be integrated into the distro's workflow.