iains / gcc-darwin-arm64

GCC master branch for Darwin with experimental support for Arm64. Currently GCC-15.0.0 [September 2024]
GNU General Public License v2.0
268 stars 33 forks source link

Error building libgomp.dylib #125

Closed simonjwright closed 4 months ago

simonjwright commented 9 months ago

This is with commit 31499d1 of 2023-11-22.

Build compiler: GCC 13.1.0, aarch64-apple-darwin21

ld: address=0x0 points to section(3) with no content in '/Volumes/Miscellaneous3/aarch64/14.0.0/gcc/aarch64-apple-darwin21/libgomp/.libs/target-indirect.o'

Configure script, with $BUILD=aarch64-apple-darwin21 $SDKROOT=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk where CommandLineTools is a symlink to CommandLineTools-15.1b3 $BOOTSTRAP=disable

$GCC_SRC/configure                                                       \
    --prefix=$PREFIX                                                     \
    --without-libiconv-prefix                                            \
    --disable-libmudflap                                                 \
    --disable-libstdcxx-pch                                              \
    --disable-libsanitizer                                               \
    --disable-libcc1                                                     \
    --disable-libcilkrts                                                 \
    --disable-multilib                                                   \
    --disable-nls                                                        \
    --enable-languages=c,c++,ada                                         \
    --host=$BUILD                                                        \
    --target=$BUILD                                                      \
    --build=$BUILD                                                       \
    --without-isl                                                        \
    --with-build-sysroot=$SDKROOT                                        \
    --with-sysroot=                                                      \
    --with-specs="%{!sysroot=*:--sysroot=%:if-exists-else($XCODE $CLT)}" \
    --with-as=/usr/bin/as                                                \
    --with-ld=/usr/bin/ld                                                \
    --with-ranlib=/usr/bin/ranlib                                        \
    --with-dsymutil=/usr/bin/dsymutil                                    \
    --$BOOTSTRAP-bootstrap                                               \
    --enable-host-pie                                                    \
    CFLAGS=-Wno-deprecated-declarations                                  \
    CXXFLAGS=-Wno-deprecated-declarations
iains commented 9 months ago

hmm..

simonjwright commented 9 months ago
  • I had a successful bootstrap using GCC-11.4 (including Ada, D, m2 and rust); using XC CLT-14.3 on aarch64-darwin21. I see you have a whole bunch of configure options, some of which are unnecessary and some of which I do not use/test.

Now that I have a working gcc-13.1.0-aarch64 I've stopped building the cross-compiler first; is this wrong?

I guess the --with-as etc. configure options aren't needed, would be interesting to have a recommended set! There's some cargo-culting going on here, admittedly.

  • non-bootstrap builds from a different compiler version are not really supported - does a bootstrap work correctly?

No, fails exactly the same in stage 1.

  • I have XC CLT 15.1b for which the assembler does not run on darwin21, will have to update if it now works.

I've probably given the wrong impression here; both my aarch64 machines are running Sonoma (14.1.1), I set MACOSX_DEPLOYMENT_TARGET=12 to support users who haven't yet upgraded, for whatever reason.

I think we have an issue with XC CLT 15.1b3 (and possibly earlier), because the libgomp.dylib issue goes away if I build with CLT 14.2.

simonjwright commented 9 months ago

This is with commit 31499d1 of 2023-11-22.

Build compiler: GCC 13.1.0, aarch64-apple-darwin21

ld: address=0x0 points to section(3) with no content in '/Volumes/Miscellaneous3/aarch64/14.0.0/gcc/aarch64-apple-darwin21/libgomp/.libs/target-indirect.o'

It turns out that this is yet another ld-classic problem. Successfully built using a shim:

#!/bin/sh

classic=$(xcrun --find ld-classic 2>/dev/null) || true

if [ -n "$classic" ]; then
    exec $classic "$@"
else
    exec ld "$@"
fi

More to come on this over at https://github.com/iains/gcc-13-branch/issues/10

fxcoudert commented 9 months ago

Have you reported the issue to Apple?

simonjwright commented 9 months ago

Have you reported the issue to Apple?

FB13416813

Mind, after having submitted the report I dug a bit further. Turns out that target-indirect.c is

void *
GOMP_target_map_indirect_ptr (void *ptr)
{
  /* Calls to this function should not be generated for host code.  */
  __builtin_unreachable ();
}

Compiling this with gcc-13.1.0-x86_64 gives a sensible-looking EH_frame, but compiling with gcc-13.1.0-aarch64 gives this, which looks garbled to me:

$ objdump -h target-indirect.o

target-indirect.o:  file format mach-o arm64

Sections:
Idx Name          Size     VMA              Type
  0 __text        00000000 0000000000000000 TEXT
  1 __text_cold   00000000 0000000000000000 TEXT
  2 __eh_frame    00000038 0000000000000000 DATA

$ objdump -D target-indirect.o

target-indirect.o:  file format mach-o arm64

Disassembly of section __TEXT,__eh_frame:

0000000000000000 <ltmp2>:
       0: 00000014      udf #20
       4: 00000000      udf #0
       8: 00527a01      <unknown>
       c: 011e7801      <unknown>
      10: 001f0c10      <unknown>
      14: 00000000      udf #0
      18: 0000001c      udf #28
      1c: 0000001c      udf #28
      20: ffffffe0      <unknown>
      24: ffffffff      <unknown>
        ...
simonjwright commented 9 months ago

target-indirect.c appears to have been added in a49c7d3; it’s in config/accel/ and config/linux/ -- we’ve picked up the linux version, the accel version is much more substantial.

simonjwright commented 9 months ago

The feedback (FB13416813) has been updated:

The error is related to the _GOMP_target_map_indirect_ptr symbol, located in the TEXT, text_cold section. This entire section is empty, so the symbol has no content, but there’s still a dwarf unwind entry referencing it. You might be able to workaround this error by either removing this symbol, or making sure it has some content.

iains commented 9 months ago

So that looks like an error on our [GCC's] part (or possibly an assumption that something that works with BFD-linkers is OK everywhere), that has not been detected by earlier linkers.

since this is x86_64 the problem happen with unpatched 13.2? If so, then we should have an upstream (GCC bugzilla) for it;

I'm somewhat tied up with other stuff right now, so not really able to suggest a short-term hack.

fxcoudert commented 9 months ago

Using ld-classic is probably a good idea anyway for now. I'll try to debug the issue over the week-end and reduce it to a simple case.

iains commented 9 months ago

Using ld-classic is probably a good idea anyway for now. I'll try to debug the issue over the week-end and reduce it to a simple case.

I wonder if we have a case where there's an empty TU for some targets (but then I don't see why we'd end up with a symbol there). [I've not tried to debug, and will most likely not have a chance this week]

fxcoudert commented 9 months ago

Reduced testcase, with ld being the Xcode 15.1 Release Candidate linker:

$ cat a.c
void * GOMP_target_map_indirect_ptr (void *ptr) {
  __builtin_unreachable ();
}
$ gcc-13 -c a.c -g -O2
$ ld -dynamic -o libtest.dylib a.o -dylib
ld: address=0x0 points to section(3) with no content in '/private/tmp/a.o'

clang output makes ld happy:

$ clang -c a.c -g -O2
$ ld -dynamic -o libtest.dylib a.o -dylib
[no error]

Trying to narrow the difference in what is output:

$ clang -c a.c -g -O2
$ nm a.o
0000000000000000 T _GOMP_target_map_indirect_ptr
0000000000000000 t ltmp0
0000000000000220 s ltmp1
$ gcc-13 -c a.c -g -O2                   
$ nm a.o
0000000000000028 s EH_frame1
0000000000000000 S _GOMP_target_map_indirect_ptr
0000000000000000 t ltmp0
0000000000000000 s ltmp1
0000000000000000 s ltmp2
0000000000000028 s ltmp3
0000000000000164 s ltmp4
0000000000000197 s ltmp5
00000000000001a9 s ltmp6
iains commented 9 months ago
  • I confirm the bug
  • ld -ld_classic accepts the same object file without even a warning, so clearly it is a regression from Apple

Trying to narrow the difference in what is output:


$ gcc-13 -c a.c -g -O2                   
$ nm a.o
0000000000000028 s EH_frame1
0000000000000000 S _GOMP_target_map_indirect_ptr
0000000000000000 t ltmp0
0000000000000000 s ltmp1
0000000000000000 s ltmp2
0000000000000028 s ltmp3
0000000000000164 s ltmp4
0000000000000197 s ltmp5
00000000000001a9 s ltmp6

please could you show the output of objdump -d -r a.o

On my very quick test, I do not see the section being empty (but, instead, containing a single trap instruction)

edit: FAOD, this is with x86_64, right?

iains commented 9 months ago

specifically, if I compile with -save-temps and look at the assembler:

        .file 1 "f.c"
        .section __TEXT,__text_cold,regular,pure_instructions
        .globl _GOMP_target_map_indirect_ptr
_GOMP_target_map_indirect_ptr:
LFB0:
        .loc 1 1 49
        .loc 1 2 3
        ud2
LFE0:

If, hypothetically, that is also what you see - but the output of objdump -d -r does not look the same, then the issue is with the assembler (i.e. clang -cc1as) rather than ld.

fxcoudert commented 9 months ago

Since I am reporting here, I am testing the aarch64-darwin, with the current branch. I don't have a system with Xcode 15.1 on Intel :(

meau /tmp $ clang -O2 -g a.c -c
meau /tmp $ objdump -d -r a.o

a.o:    file format mach-o arm64

Disassembly of section __TEXT,__text:

0000000000000000 <ltmp0>:
       0: d4200020      brk #0x1
meau /tmp $ gcc-13 -O2 -g a.c -c
meau /tmp $ objdump -d -r a.o   

a.o:    file format mach-o arm64
meau /tmp $ 

The assembly generated by GCC is:

        .arch armv8-a
        .text
Ltext0:
        .file 1 "a.c"
        .section __TEXT,__text_cold,regular,pure_instructions
        .align  2
        .globl _GOMP_target_map_indirect_ptr
_GOMP_target_map_indirect_ptr:
LFB0:
        .loc 1 1 49
        .loc 1 2 3
LFE0:

and by clang:

        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 14, 0     sdk_version 14, 2
        .globl  _GOMP_target_map_indirect_ptr   ; -- Begin function GOMP_target_map_indirect_ptr
        .p2align        2
_GOMP_target_map_indirect_ptr:          ; @GOMP_target_map_indirect_ptr
Lfunc_begin0:
        .file   1 "/tmp" "a.c"
        .loc    1 1 0                           ; a.c:1:0
        .cfi_startproc
; %bb.0:
        .loc    1 2 3 prologue_end              ; a.c:2:3
        brk     #0x1
Ltmp0:
Lfunc_end0:
        .cfi_endproc
iains commented 9 months ago

Since I am reporting here, I am testing the aarch64-darwin, with the current branch. I don't have a system with Xcode 15.1 on Intel :(

The assembly generated by GCC is:

        .arch armv8-a
        .text
Ltext0:
        .file 1 "a.c"
        .section __TEXT,__text_cold,regular,pure_instructions
        .align  2
        .globl _GOMP_target_map_indirect_ptr
_GOMP_target_map_indirect_ptr:
LFB0:
        .loc 1 1 49
        .loc 1 2 3
LFE0:

That is different from the x86_64 case (there is indeed no content here) ... whereas....

and by clang:

        .section        __TEXT,__text,regular,pure_instructions
        .build_version macos, 14, 0     sdk_version 14, 2
        .globl  _GOMP_target_map_indirect_ptr   ; -- Begin function GOMP_target_map_indirect_ptr
        .p2align        2
_GOMP_target_map_indirect_ptr:          ; @GOMP_target_map_indirect_ptr
Lfunc_begin0:
        .file   1 "/tmp" "a.c"
        .loc    1 1 0                           ; a.c:1:0
        .cfi_startproc
; %bb.0:
        .loc    1 2 3 prologue_end              ; a.c:2:3
        brk     #0x1
Ltmp0:
Lfunc_end0:
        .cfi_endproc

.... clang is putting a trap instruction in.

So, on aarch64, we do have a discrepancy - I need to figure out where the aarch64 port decides to/not to insert the trap.

iains commented 9 months ago

see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109267 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438 (I will have to see if a similar fix can apply to aarch64).

iains commented 9 months ago

note that the workaround from PR57438 does also appear to work for aarch64. -D__builtin_unreachable=__builtin_trap (maybe add that to the recipe for the affected files, rather than globally - although TBH a trap is more user-friendly than UB .. )

iains commented 5 months ago

unfortunately, this remedy does not seem to work for modula-2 (which also has instances of this issue) - so a proper solution is called for - and it seems to be a bit more tricky for aarch64 than the solutions I did for x86 and powerpc.

iains commented 5 months ago

My testing with a cross to aarch64-darwin from x86_64-darwin on macOS 14 with Xcode 15.3b2, suggests that this is working - actually, there is some fallout from the change (but right now, I think that the change actually has identified a second problem - which is not limited to the arm64 port).

Please test the latest master-wip-apple-si branch; if all goes OK then I'll backport for 13.3, 12.4 and 11.5.

iains commented 5 months ago

The feedback (FB13416813) has been updated:

The error is related to the _GOMP_target_map_indirect_ptr symbol, located in the TEXT, text_cold section. This entire section is empty, so the symbol has no content, but there’s still a dwarf unwind entry referencing it. You might be able to workaround this error by either removing this symbol, or making sure it has some content.

What is the situation with the FB now?

We have just had a long discussion on IRC and on the gcc-patches list about solutions to the underlying problem (empty functions because of any reason - e.g. macro-conditional content, optimised away etc).

The assertion of global maintainers is that, generally empty content is to be expected in real-life code.

 e.g. asm (""); __builtin_unreachable (); will result in that too (or asm which actually has some large template, but either expands just into a different section, or has macros that yield nothing)

but generally, DW_CFA_advance_loc* can always skip over something that is empty and not known at compile time (like inline asms that don't contribute anything to the current section), so generally having something to apply for an empty range is well defined DWARF construct

So, I can fix the current case (i.e. a function optimised to __builtin_unreachable) to produce a trap there - but it seems that there's potentially a wider issue/

Since FBs are not public - please could you update ?

@fxcoudert is this the only FB for the topic? (given that we read it as a regression from ld64)? I wonder if there's some way to either expedite - or if it won't be fixed then to find out soon so that we can try to react in the compiler.

simonjwright commented 5 months ago

Since FBs are not public - please could you update ?

There’s been no update to the FB since my last report.

My report was


The file target-indirect.o is generated as part of libgomp.dylib during GCC 14.0.0 build for arm64 (sources at https://github.com/iains/gcc-darwin-arm64).

While doing the link to produce the dylib, ld reports ld: address=0x0 points to section(3) with no content in '/Users/simon/Developer/bugs/gcc/ld_classic/.libs/target-indirect.o'

I've created an attachment (target-indirect.zip) containing the object file concerned.

With ld: $ ld -dylib -o libgomp.dylib objs/*.o -no_compact_unwind -syslibroot $(xcrun --show-sdk-path) -lSystem ld: address=0x0 points to section(3) with no content in '/Users/simon/Developer/bugs/gcc/ld_classic/objs/target-indirect.o'

Using ld-classic $ $(xcrun --find ld-classic) -dylib -o libgomp.dylib objs/*.o -no_compact_unwind -syslibroot $(xcrun --show-sdk-path) -lSystem runs successfully.


The response was


The error is related to the _GOMP_target_map_indirect_ptr symbol, located in the TEXT, text_cold section. This entire section is empty, so the symbol has no content, but there’s still a dwarf unwind entry referencing it. You might be able to workaround this error by either removing this symbol, or making sure it has some content.


and the summary at the top is


Recent Similar Reports: None Resolution: Investigation complete - Works as currently designed

iains commented 5 months ago

and the summary at the top is

Recent Similar Reports: None Resolution: Investigation complete - Works as currently designed

Which I translate as "we are not going to fix this, it's your problem to generate code that does not cause this".

@fxcoudert do you know of any other FBs in the system?

fxcoudert commented 5 months ago

@fxcoudert do you know of any other FBs in the system?

Not aware of any, will ping that one.

simonjwright commented 5 months ago

Please test the latest master-wip-apple-si branch; if all goes OK then I'll backport for 13.3, 12.4 and 11.5.

I just did a bootstrap (C, C++, Ada) on M1 macOs 14.4.1, CLT 15.3, base compiler 13.2.0-aarch64; built without issues.

iains commented 4 months ago

I believe that this is fixed on the development branch, and will be back ported to 13.3, 12.4 and 11.5.