iains / gcc-14-branch

GCC 14 for Darwin with experimental Arm64 support. Current release 14.2-darwin-r0 [August 2024]
GNU General Public License v2.0
9 stars 2 forks source link

Stage 2 failure building generator #3

Open simonjwright opened 4 months ago

simonjwright commented 4 months ago

A colleague is building gcc-14.1-darwin-r0 in a Github runner (macos-14 runs on an M1-based machine) and has problems with the gen_il-main built in stage 2.

The stage1 compiler is aarch64-apple-darwin 13.2.0, which successfully builds and runs:

2024-05-27T16:01:53.6688970Z mkdir -p ada/gen_il
2024-05-27T16:01:53.6836290Z cd ada/gen_il; gnatmake -q -g -gnata -gnat2012 -gnatw.g -gnatyg -gnatU -I/Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/src/gcc/ada gen_il-main
[...]
2024-05-27T16:01:57.9128640Z cd ada/gen_il; ./gen_il-main
2024-05-27T16:01:58.1111820Z install.texi:261: warning: @anchor should not appear on @item line

The stage2 compiler builds the same:

2024-05-27T16:14:20.4760580Z mkdir -p ada/gen_il
2024-05-27T16:14:20.4823240Z cd ada/gen_il; gnatmake -q -g -gnata -gnat2012 -gnatw.g -gnatyg -gnatU -I/Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/src/gcc/ada gen_il-main

but when it’s run, this happens:

2024-05-27T16:14:24.8200660Z cd ada/gen_il; ./gen_il-main
2024-05-27T16:14:24.8266370Z dyld[74097]: Symbol not found: ___builtin_nested_func_ptr_created
2024-05-27T16:14:24.8326660Z   Referenced from: <A3324F46-453C-314E-B26B-51C847B1E704> /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/build/gcc/ada/gen_il/gen_il-main
2024-05-27T16:14:24.8329370Z   Expected in:     <B3E386AD-E6E3-3A23-B4B3-C06CA85CFE57> /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/build/prev-gcc/libgcc_s.1.1.dylib
2024-05-27T16:14:24.8337720Z /bin/sh: line 1: 74097 Abort trap: 6           ./gen_il-main
2024-05-27T16:14:24.8340570Z make[3]: [ada/stamp-gen_il] Error 134 (ignored)

Could the Xcode/CLT version make a difference? The runner is running 14.4.1, and the default Xcode is 15.0.1.

iains commented 4 months ago

I recall earlier Xcode 15 having quite a few problems (mostly with the new linker) that we worked around by configuring to use "ld-classic" .. If possible, update to Xcode 15.3 which is known to work (and has quite a few wrinkles ironed out). If that is not possible, then perhaps we can find a configure recipe that will use the 'classic' linker.

simonjwright commented 4 months ago

Our problem with 15.3 (well, CLT 15.3) is that it doesn’t provide m4 (it does provide gm4, though, so you can configure GMP with M4=gm4). I just ran the job locally with XC 15.4 (no CLT yet, though) and it succeeded.

Since this compiler is being provided to users, would it make sense to go for as early an XC version as actually builds the compiler? I’m trying XC 15.1 on Github as I write ...

iains commented 4 months ago

If you are trying to have one build that runs using several different XC installs, we are going to need to be careful - the compiler configuration determines the capabilities of the support toolchain (as, ld, dsymutil) and adjusts accordingly. Similarly, the SDK in use is relevant (since some of the SDKs require special handling).

If you are building the compiler, and providing it as a built item - then perhaps building m4 is not the end of the world.

For the record, I build GMP et. al as "in-tree" sources - they are bootrapped along with the compiler and I have not run into trouble with XC15.1b. As noted CLT 15.0 did have (dyld-linker) problems that are show-stoppers .. so .. you need to stop and take pause about what you want to offer and how to communicate that to your users.

edit: note that generally speaking toolchains are considered to consist of all the items in use - compiler, linker, assembler, debug linker. Xcode does not try to marry an arbitrary clang version with any arbitrary linker .. so perhaps you can constrain what you offer without being seen as providing a poor solution?

simonjwright commented 4 months ago

We’re building the compiler, and providing it to our users as a built item.

I build gmp etc in-tree, my colleagues out-of-tree, for reasons. Anyway, I ran export M4=gm4 and all worked for their build (of gmp, since we still have compiler issues).

I ran on Github with XC 15.1 and 15.3 - no change.

I noticed that using their successful x86_64 build of GCC 14.1 to compile this trampoline-using program

with Ada.Text_IO;
procedure Trampo is
   type T is access procedure;
   procedure P (The_T : T) is
   begin
      The_T.all;
   end P;
   procedure A is
   begin
      Ada.Text_IO.Put_Line ("foo.");
   end A;
begin
   P (A'Access);
end Trampo;

generated a reference to libgcc_s.1.1.dylib

$ otool -L trampo
trampo:
    @rpath/libgcc_s.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.100.2)

in spite of the fact that it didn’t need to, because of linking with -lheapt_w.

My build of GCC 14.1.0 (using the FSF release) didn’t.

I think this might be down to a specs difference: theirs has

*libgcc:
%{static-libgcc|static:                           %:version-compare(!> 10.6 mmacosx-version-min= -lgcc_eh);          shared-libgcc|fexceptions|fobjc-exceptions|fgnu-runtime:            -lgcc_s.1.1                              %:version-compare(!> 10.3.9 mmacosx-version-min= -lgcc_eh)              %:version-compare(>< 10.3.9 10.5 mmacosx-version-min= -lgcc_s.10.4)       %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)       } -lgcc

and mine has

*libgcc:
%{static-libgcc|static:                           %:version-compare(!> 10.6 mmacosx-version-min= -lgcc_eh);          shared-libgcc|fexceptions|fobjc-exceptions|fgnu-runtime:            %:version-compare(!> 10.11 mmacosx-version-min= -lgcc_s.1.1)                             %:version-compare(!> 10.3.9 mmacosx-version-min= -lgcc_eh)              %:version-compare(>< 10.3.9 10.5 mmacosx-version-min= -lgcc_s.10.4)       %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)       } -lgcc

(the difference is that theirs has shared-libgcc|fexceptions|fobjc-exceptions|fgnu-runtime: -lgcc_s.1.1 whereas mine has shared-libgcc|fexceptions|fobjc-exceptions|fgnu-runtime: %:version-compare(!> 10.11 mmacosx-version-min= -lgcc_s.1.1))

Why that should result in their failure to build the aarch64 compiler I don’t know.

I found some interaction between gcc/configure and gcc/config/darwin.h to do with DARWIN_AT_RPATH, which looks as though it's related to the two different specs entries above.

iains commented 4 months ago

getting the specs right for the various permutations of Darwin's support for DYLD_LIBRARY_PATH has been somewhat of a labour.

Since (at least what I infer from your report) it seems that you are using different versions/branches/patches (?) it's very hard for me to figure out what might be wrong.

Really, the recommendation (for people who are not using homebrew, macports etc.) would be to use the branch(es) here - and if something does not work properly (or how you need it to) report issues so we can fix (or work around) them .. once we diverge significantly it's outside of what my meagre resources can handle :) and specs are a powerful tool ..

simonjwright commented 4 months ago

Both I and my colleagues are using gcc-14.1-darwin-r0, unpatched. The build script we are both using

When run on Github, the compiler build crashes with the failure to run the gen_il-main built in stage 2, because of the problem I quoted at the top of this issue.

When run locally, the compiler build succeeds.

I’ve tried XC 15.1 locally and on Github, no change.

What I’m going to do next (after family commitments) is to compare the build logs.

After that, I’ll try to find why my own set of build scripts (again, using unpatched gcc-14.1-darwin-r0) doesn’t build gen_il-main using libgcc_s.1.1.dylib.

iains commented 4 months ago

Both I and my colleagues are using gcc-14.1-darwin-r0, unpatched.

That the specs are different is, in that case, pretty odd.

Having said that the configuration does adjust to the capabilities of the host system - including to whatever is detected in terms of linker capabilities***

The build script we are both using

  • downloads a built 13.2.0 aarch64 compiler
  • downloads gmp etc sources
  • downloads gcc-14.1-darwin-r0
  • builds gmp etc

On the GH/colleague's version this is a separate step - where you (and I) usually just build them in-tree? (not that I expect that to make the difference).

  • builds the 14.1 compiler

When run on Github, the compiler build crashes with the failure to run the gen_il-main built in stage 2, because of the problem I quoted at the top of this issue.

Can we get the output of uname -a on the GH instance? we might also need to get the versions of some key installed utilities - see *** below.

When run locally, the compiler build succeeds.

I’ve tried XC 15.1 locally and on Github, no change.

no change == builds locally, fails on GH?

What I’m going to do next (after family commitments) is to compare the build logs.

After that, I’ll try to find why my own set of build scripts (again, using unpatched gcc-14.1-darwin-r0) doesn’t build gen_il-main using libgcc_s.1.1.dylib.

That could be quite an involved project - now I know you are starting from the same point - the thing is to figure out why the configure / build thinks that there's a difference.

===

*** some configure functionalities depend on the installed utilities (e.g. gawk, objdump, otool, etc.) further to that it can be that the linker used by the build compiler can be relevant.

Those are areas where there could be deviation between the environments.

I am understanding (hopefully) that the failing get_il-main is built by the stage #1 compiler (and not the host/bootstrap one)?

iains commented 4 months ago

if it is possible, please could you post the output of:

otool -lv gcc/ada/gen_il/gen_il-main |grep -A3 LC_RPA

from the failing case ..

.. IIUC, this is a $build tool (i.e. intended to run on the $build system which can be different from the $host one) and should be being built with the bootstrap/build-system compiler.

edit: there are several Ada build-time tools.

iains commented 4 months ago

please also post your configure lines (for both cases). I just took a look at my gcc-14.1 build and get_il-main seems to be correctly built with the bootstrap/build-system compiler at each stage.

simonjwright commented 4 months ago

I’m going to be away for a few days, but in the mean time

simonjwright commented 4 months ago

I am understanding (hopefully) that the failing get_il-main is built by the stage https://github.com/iains/gcc-14-branch/issues/1 compiler (and not the host/bootstrap one)?

Yes, it’s the one built in stage 2 by the compiler that was built by the host compiler in stage 1

iains commented 4 months ago

I am understanding (hopefully) that the failing get_il-main is built by the stage #1 compiler (and not the host/bootstrap one)?

Yes, it’s the one built in stage 2 by the compiler that was built by the host compiler in stage 1

but I don't think that is right .. it is an exe to run on the build system - it should be built with XXX_FOR_BUILD (which is effectively XXX_FOR_HOST when bootstrapping). I will have to see from your configure lines what is confusing the ada build into using the stageN compiler to build for $build (it does not happen for my configure lines, that exe is built with the bootstrap/build-system compiler in each stage)

iains commented 4 months ago

BTW : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79885 is the reason that I avoid --with-build-sysroot= ...

I see both configure lines have a bunch of (what I believe are) unnecessary configure flags - it is important to be very clear about why any non-standard flag is added - the configure script is supposed to get it right for the platform; if it is failing we should fix it and then remove any work-around.

iains commented 4 months ago

I would also suggest : --build=aarch64-apple-darwin2? (edited) for both cases

I like your creative specs for the sysroot; but what would be better is an upstream BZ that says what the problem is so that, maybe, we can find a more efficient solution than checking for each invocation.

You are on different OS versions, it seems .. although TBH, I'd hope that does not affect things too much.

Are you using the singe additional fix posted to the gcc-14-1-darwin branch? https://github.com/iains/gcc-14-branch/commit/75ff8c390327ac693f6a1c40510bc0d35d7a1e22

that could, potentially fix issues with mishandling the SDK (although I'd expect a different kind of fail from it .. so maybe not relevant).

edit2: note also that gnatmake et. al. do not accept a --sysroot option .. so that has to be done at a lower level (hopefully, it is working OK).

simonjwright commented 4 months ago

I was wrong about the compiler used to build gen_il-main - it’s the host compiler (13.2.0 in these builds).

I tried the 14.1.1 patch, no change.

I’ve done my local builds in a new account running zsh without any .zshrc etc.

uname -a -

GH, Darwin Mac-1717606620560.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:39 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_VMAPPLE arm64

Mine, Darwin ramoth.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103 arm64

The reason for the different specs is almost certainly that I export MACOSX_DEPLOYMENT_TARGET=12, which has the same effect as --disable-darwin-at-rpath -- a BZ issue? Anyway, not relevant here, so I’ll just discuss building on GH vs building locally.

In both GH and local, the second build of gen_il-main gives (modulo the leading path)

otool -L

otool -L ada/gen_il/gen_il-main
ada/gen_il/gen_il-main:
    @rpath/libgcc_s.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

otool -l gen_il-main

otool -l ada/gen_il/gen_il-main | grep -A3 LC_RPATH
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
Load command 17
          cmd LC_RPATH
      cmdsize 144
         path /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/base_gcc/install/lib/gcc/aarch64-apple-darwin23.2.0/13.2.0 (offset 12)
Load command 18
          cmd LC_RPATH
      cmdsize 112
         path /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/base_gcc/install/lib/gcc (offset 12)
Load command 19
          cmd LC_RPATH
      cmdsize 104
         path /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/base_gcc/install/lib (offset 12)
Load command 20

.. is the first @loader_path OK? aside from that, looks fine. The last of those should have picked up the libgcc_s.1.1.dylib we need from the host compiler, which presumably it does in stage 1, but in stage 2:

cd ada/gen_il; ./gen_il-main
dyld[75493]: Symbol not found: ___builtin_nested_func_ptr_created
  Referenced from: <A3324F46-453C-314E-B26B-51C847B1E704> /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/build/gcc/ada/gen_il/gen_il-main
  Expected in:     <B3E386AD-E6E3-3A23-B4B3-C06CA85CFE57> /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/build/prev-gcc/libgcc_s.1.1.dylib
/bin/sh: line 1: 75493 Abort trap: 6           ./gen_il-main

... where it’s trying to pick it up from gcc/build/prev-gcc, i.e. the new build.

iains commented 4 months ago

I was wrong about the compiler used to build gen_il-main - it’s the host compiler (13.2.0 in these builds).

I tried the 14.1.1 patch, no change.

The reason for the different specs is almost certainly that I export MACOSX_DEPLOYMENT_TARGET=12, which has the same effect as --disable-darwin-at-rpath -- a BZ issue?

That is not correct, indeed, @rpath is needed for correct operation on any OS >= 10.11 (which includes 12) .. that is a bug - please could you try with MACOSX_DEPLOYMENT_TARGET=12.0 to see if that helps narrow down the issue.

I'll look at the rest of the report later / tomorrow.

iains commented 4 months ago

I was wrong about the compiler used to build gen_il-main - it’s the host compiler (13.2.0 in these builds).

I tried the 14.1.1 patch, no change.

I’ve done my local builds in a new account running zsh without any .zshrc etc.

uname -a -

GH, Darwin Mac-1717606620560.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:39 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_VMAPPLE arm64

Mine, Darwin ramoth.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103 arm64

The reason for the different specs is almost certainly that I export MACOSX_DEPLOYMENT_TARGET=12, which has the same effect as --disable-darwin-at-rpath -- a BZ issue? Anyway, not relevant here, so I’ll just discuss building on GH vs building locally.

In both GH and local, the second build of gen_il-main gives (modulo the leading path)

otool -L

otool -L ada/gen_il/gen_il-main
ada/gen_il/gen_il-main:
  @rpath/libgcc_s.1.1.dylib (compatibility version 1.0.0, current version 1.1.0)
  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

otool -l gen_il-main

otool -l ada/gen_il/gen_il-main | grep -A3 LC_RPATH
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
Load command 17
          cmd LC_RPATH
      cmdsize 144
         path /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/base_gcc/install/lib/gcc/aarch64-apple-darwin23.2.0/13.2.0 (offset 12)
Load command 18
          cmd LC_RPATH
      cmdsize 112
         path /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/base_gcc/install/lib/gcc (offset 12)
Load command 19
          cmd LC_RPATH
      cmdsize 104
         path /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/base_gcc/install/lib (offset 12)
Load command 20

.. is the first @loader_path OK? aside from that, looks fine.

yes that allows libraries to find dependents that are co-installed (actually not necessary for libgcc_s.1.1.dylib since it's a leaf - but also should be harmless.

The last of those should have picked up the libgcc_s.1.1.dylib we need from the host compiler, which presumably it does in stage 1, but in stage 2:

this looks right and, for my builds (the use of GCC-11.4 c.f. 13.2 should be irrelevant):

$ otool -lv gcc/ada/gen_il/gen_il-main |grep -A3 LC_RP
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
Load command 17
          cmd LC_RPATH
      cmdsize 96
         path /opt/iains/x86_64-apple-darwin23/gcc-11-4Dr1/lib/gcc/x86_64-apple-darwin23/11.4.0 (offset 12)
Load command 18
          cmd LC_RPATH
      cmdsize 64
         path /opt/iains/x86_64-apple-darwin23/gcc-11-4Dr1/lib (offset 12)
Load command 19

$ otool -lv prev-gcc/ada/gen_il/gen_il-main |grep -A3 LC_RP
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
Load command 17
          cmd LC_RPATH
      cmdsize 96
         path /opt/iains/x86_64-apple-darwin23/gcc-11-4Dr1/lib/gcc/x86_64-apple-darwin23/11.4.0 (offset 12)
Load command 18
          cmd LC_RPATH
      cmdsize 64
         path /opt/iains/x86_64-apple-darwin23/gcc-11-4Dr1/lib (offset 12)
Load command 19

$ otool -lv stage1-gcc/ada/gen_il/gen_il-main |grep -A3 LC_RP
          cmd LC_RPATH
      cmdsize 32
         path @loader_path (offset 12)
Load command 17
          cmd LC_RPATH
      cmdsize 96
         path /opt/iains/x86_64-apple-darwin23/gcc-11-4Dr1/lib/gcc/x86_64-apple-darwin23/11.4.0 (offset 12)
Load command 18
          cmd LC_RPATH
      cmdsize 64
         path /opt/iains/x86_64-apple-darwin23/gcc-11-4Dr1/lib (offset 12)
Load command 19

So .. as you can see, the $build system compiler is used in each case and correctly adds its rpaths.

cd ada/gen_il; ./gen_il-main
dyld[75493]: Symbol not found: ___builtin_nested_func_ptr_created
  Referenced from: <A3324F46-453C-314E-B26B-51C847B1E704> /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/build/gcc/ada/gen_il/gen_il-main
  Expected in:     <B3E386AD-E6E3-3A23-B4B3-C06CA85CFE57> /Users/runner/work/GNAT-FSF-builds/GNAT-FSF-builds/sbx/aarch64-darwin/gcc/build/prev-gcc/libgcc_s.1.1.dylib
/bin/sh: line 1: 75493 Abort trap: 6           ./gen_il-main

Certainly, this is wrong - and, I think, is the result of disabling darwin-at-rpath (although how that is affecting what the $build system compiler is doing is 'interesting'.

We need to figure out why but, in the short-term how about confirming this with --enable-darwin-at-rpath in the configure options?

MACOSX_DEPLOYMENT_TARGET is, unfortunately, somewhat "grape shot" in that, if you export it from the top level build - it affects every compiler in use. IFF you are trying to bootstrap a compiler targeting macOS N-M on macOS N, really there are several steps that should be taken, and we should discuss that outside this issue.

iains commented 4 months ago

hmmm I also wonder if this is an odd interaction between gnatmake and the external compiler.

supposing that the gnatmake used to build gen_il is not the $build one, but happens to be the "just built" one .. then perhaps it is passing inappropriate flags to the called GCC version (which is $build) ... it has been the case in the past that unguarded "gnatmake" has been used in Ada build recipes - I have fixed some instances (to make sure that they use GNATMAKE_FOR_BUILD/HOST etc) .. but maybe missed some still .....

iains commented 4 months ago

one other - possibly tangential - point.

It seems that GH runners have several versions of Xcode installed - but default to the earliest (and broken for GCC) version. Apparently one can use sudo Xcode-select ... to pick a known good one.

simonjwright commented 3 months ago

For info - may be irrelevant - the host compiler (13.2.0) included a shim to link using ld-classic if found. This was largely because of the XC 15.0 issue, but also there was the exception handling issue fixed in GCC 13.3.0 (at least in Iains's version) and 14.1.0.

I've just

Things are greatly improved, because I now get to this:

2024-06-08T12:05:10.1363350Z Comparing stages 2 and 3
2024-06-08T12:05:17.4905580Z Bootstrap comparison failure!
2024-06-08T12:05:17.4918740Z aarch64-apple-darwin23.5.0/libstdc++-v3/src/.libs/libstdc++.6.dylib-master.o differs

I found this comment, so I'll have another go without --without-build-config.

iains commented 3 months ago

I found this comment, so I'll have another go without --without-build-config.

Indeed my reaction was the same in this case too "what problem is that solving?" ... (I will keep repeating the mantra ... "do not add configure options unless you know why and what problem they are solving". We see more than a few cases where some workaround has been cargo-culted forward many times beyond when it was necessary... usually with unintended consequences).

simonjwright commented 3 months ago

The note in their script says this was because of BZ 100340 -- which is RESOLVED FIXED.

And removing it has resulted in a successful build!!

Would you expect a compiler built against XC 15.n to run correctly under XC 15.{m < n}? There doesn’t seem to be a problem with XC 15.{m > n}.

Is it worth trying to find which change actually fixed this problem? (it would be quite tedious)

iains commented 3 months ago

The note in their script says this was because of [BZ 100340] And removing it has resulted in a successful build!!

great!

Would you expect a compiler built against XC 15.n to run correctly under XC 15.{m < n}? There doesn’t seem to be a problem with XC 15.{m > n}.

This is not a reasonable scheme - the compiler is configured to use the facilities of 15.3 (which includes a working linker) - if you put it on a system with 15.0 installed, you are making it use a broken linker .. this will end badly :)

(even apart form that) We (the volunteer devs) are an extremely limited resource. Even CPU-cycles-wise testing the permutations is infeasible. AFAIU, it's possible to configure a GH runner to use 15.3 ..

.. until we qualify a 15.4 CLT (which does not even appear to be released yet) - I think we have to say that the requirement is 15.3 (there are known bugs with earlier CLT versions).

FWIW - I would love to have time to validate and include our own "binutils" so that these issues go away (or at least become ones we can solve) .. but that's yet another thing limited by resources

Homebrew has similar policies - so I do not think your users should be too surprised - as previously noted even Xcode does not support this kind of mix and match - a toolchain is an incredibly complex entity with many moving parts .. it needs to be tested as a whole :)

simonjwright commented 3 months ago

(I will keep repeating the mantra ... "do not add configure options unless you know why and what problem they are solving". We see more than a few cases where some workaround has been cargo-culted forward many times beyond when it was necessary... usually with unintended consequences)

My previous build script was

$GCC_SRC/configure                                                       \
    --prefix=$PREFIX                                                     \
    --without-libiconv-prefix                                            \
    --disable-libmudflap                                                 \
    --disable-libstdcxx-pch                                              \
    --disable-libsanitizer                                               \
    --disable-libcc1                                                     \
    --disable-libcilkrts                                                 \
    --disable-multilib                                                   \
    --disable-nls                                                        \
    --enable-languages=c,c++,ada                                         \
    --host=$BUILD                                                        \
    --target=$BUILD                                                      \
    --build=$BUILD                                                       \
    --without-isl                                                        \
    --with-build-sysroot=$SDKROOT                                        \
    --with-sysroot=                                                      \
    --with-specs="%{!sysroot=*:--sysroot=%:if-exists-else($XCODE $CLT)}" \
    --with-bugurl=$BUGURL                                                \
    --$BOOTSTRAP-bootstrap                                               \
    --enable-host-pie                                                    \
    CFLAGS=-Wno-deprecated-declarations                                  \
    CXXFLAGS=-Wno-deprecated-declarations

I just tried

$GCC_SRC/configure                                                       \
    --prefix=$PREFIX                                                     \
    --enable-languages=c,c++,ada                                         \
    --build=$BUILD                                                       \
    --with-build-sysroot=$SDKROOT                                        \
    --with-sysroot=                                                      \
    --with-specs="%{!sysroot=*:--sysroot=%:if-exists-else($XCODE $CLT)}" \
    --with-bugurl=$BUGURL                                                \
    --$BOOTSTRAP-bootstrap

with 14.1.0 (r1) : complete success!

One surprising thing: libgcc was generated as libgcc.so, a "Mach-O 64-bit bundle arm64".

simonjwright commented 3 months ago

I think we're done here. Thanks for the help!

iains commented 3 months ago

--with-build-sysroot=$SDKROOT is still broken (per the BZ I referenced earlier) - I am not sure why you are using it .. it does not solve any problems, only adds to them... [agreed it would be nice to fix it ..] but...

IFF you use SDKROOT [environment] the compiler should honour that. IFF you use --with-sysroot=/path/to/sdk [configure] then

I think using --with-build-sysroot= is defeating the fixincludes process and we cannot, unfortunately, omit fixincludes for any so-far released SDK ..

One surprising thing: libgcc was generated as libgcc.so, a "Mach-O 64-bit bundle arm64".

That is surprising .. and I expect it is also broken in use... I would like to repeat this, if possible - I fear that libtool is getting confused somehow - into thinking this is not a Darwin system...

--$BOOTSTRAP-bootstrap what values does $BOOTSTRAP take?

What other environment (e.g. SDKROOT, MACOSX_DEPLOYMENT_TARGET)?

simonjwright commented 3 months ago

--with-build-sysroot=$SDKROOT is still broken (per the BZ I referenced earlier) - I am not sure why you are using it .. it does not solve any problems, only adds to them... [agreed it would be nice to fix it ..] but...

I can't find a BZ about this?

Current working setup:

$GCC_SRC/configure                                                       \
    --prefix=$PREFIX                                                     \
    --enable-languages=c,c++,ada                                         \
    --build=$BUILD                                                       \
    --with-sysroot=$SDKROOT                                              \
    --with-specs="%{!sysroot=*:--sysroot=%:if-exists-else($XCODE $CLT)}" \
    --with-bugurl=$BUGURL                                                \
    --$BOOTSTRAP-bootstrap

with BUILD = aarch64-apple-darwin21 SDKROOT = Library/Developer/CommandLineTools/SDKs/MacOSX.sdk (15.3) XCODE = /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk CLT = /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk BOOTSTRAP = enable (default, could be disable) MACOSX_DEPLOYMENT_TARGET = 12 (I set it to 12.0 in a different branch & forgot to copy here)

IFF you use SDKROOT [environment] the compiler should honour that. IFF you use --with-sysroot=/path/to/sdk [configure] then

the built compiler will use that by default .. but... SDKROOT [environment] will override it (i.e. behave the same as clang) ... and... --sysroot= will override that is the user puts it on the c/l I think using --with-build-sysroot= is defeating the fixincludes process and we cannot, unfortunately, omit fixincludes for any so-far released SDK ..

Having built as above,

One surprising thing: libgcc was generated as libgcc.so, a "Mach-O 64-bit bundle arm64".

So sorry, it was actually libcc1.so. libgcc_s.1.1.dylib was "Mach-O universal binary with 1 architecture: [arm64:Mach-O 64-bit dynamically linked shared library arm64]".

libcc1.so is the one that's "Mach-O 64-bit bundle arm64"

iains commented 3 months ago

--with-build-sysroot=$SDKROOT is still broken (per the BZ I referenced earlier) - I am not sure why you are using it .. it does not solve any problems, only adds to them... [agreed it would be nice to fix it ..] but...

I can't find a BZ about this?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79885

So sorry, it was actually libcc1.so. libgcc_s.1.1.dylib was "Mach-O universal binary with 1 architecture: [arm64:Mach-O 64-bit dynamically linked shared library arm64]".

libcc1.so is the one that's "Mach-O 64-bit bundle arm64"

That's expected, it's a plugin.

iains commented 2 months ago

I have now seen this problem reported another time - but with no luck at pinning down what configure or environment was triggering it.

What is your bootstrap gnat version? (i.e the one on $build, which I guess is the same as $host).

iains commented 2 months ago

please could you try https://github.com/iains/gcc-14-branch/tree/gcc-14-2-darwin-pre-0

iains commented 2 months ago

see also : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116021 (which is about unpatched trunk) - so this is almost certainly not something to do with additional patches on the branch.