iains / gcc-darwin-arm64

GCC master branch for Darwin with experimental support for Arm64. Currently GCC-15.0.0 [September 2024]
GNU General Public License v2.0
268 stars 33 forks source link

Handling of -march=armv8.4-a+sve #115

Closed fxcoudert closed 7 months ago

fxcoudert commented 1 year ago

From OpenBLAS https://github.com/xianyi/OpenBLAS/issues/4212

We want to compile SVE-enabled code with gcc -march=armv8.4-a+sve -mtune=neoverse-v1. The full compilation line is:

gcc-13 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=56 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.24\" -march=armv8.4-a+sve -mtune=neoverse-v1 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=_dgemm_kernel_NEOVERSEV1 -DASMFNAME=_dgemm_kernel_NEOVERSEV1_ -DNAME=dgemm_kernel_NEOVERSEV1_ -DCNAME=dgemm_kernel_NEOVERSEV1 -DCHAR_NAME=\"dgemm_kernel_NEOVERSEV1_\" -DCHAR_CNAME=\"dgemm_kernel_NEOVERSEV1\" -DNO_AFFINITY -DTS=_NEOVERSEV1 -I.. -DBUILD_KERNEL -DTABLE_NAME=gotoblas_NEOVERSEV1 -DDOUBLE -UCOMPLEX -c -DDOUBLE -UCOMPLEX ../kernel/arm64/dgemm_kernel_sve_v2x8.S -o dgemm_kernel_NEOVERSEV1.o

which in turns calls the assembler as:

clang -cc1as -triple arm64-apple-macosx13.0.0 -filetype obj -main-file-name ccMnB12p.s -target-cpu apple-m1 -target-feature +v8.5a -target-feature +crc -target-feature +lse -target-feature +rdm -target-feature +crypto -target-feature +dotprod -target-feature +fp-armv8 -target-feature +neon -target-feature +fp16fml -target-feature +ras -target-feature +rcpc -target-feature +zcm -target-feature +zcz -target-feature +fullfp16 -target-feature +sm4 -target-feature +sha3 -target-feature +sha2 -target-feature +aes -I .. -fdebug-compilation-dir=/private/tmp/openblas-20230905-12149-1dh80k/OpenBLAS-0.3.24/kernel -dwarf-debug-producer "Apple clang version 14.0.3 (clang-1403.0.22.14.1)" -I .. -dwarf-version=4 -mrelocation-model pic --mrelax-relocations -mllvm -disable-aligned-alloc-awareness=1 -o dgemm_kernel_NEOVERSEV1.o /private/tmp/ccMnB12p.s

which errors out with:

dgemm_kernel_NEOVERSEV1.s:21:2: error: instruction requires: sve or sme
 dup z7.d, x18
 ^
dgemm_kernel_NEOVERSEV1.s:22:5: error: instruction requires: sve or sme
    cntd x19
    ^
dgemm_kernel_NEOVERSEV1.s:26:5: error: instruction requires: sve or sme
    ptrue p0.d
    ^
dgemm_kernel_NEOVERSEV1.s:40:2: error: instruction requires: sve or sme
 dup z16.d, #0
 ^

See that SVE was lost. Apple's clang, however, accepts SVE, and actually adding -target-feature +sve makes the assembler succeed.

The source file in this case is (reduced):

$ cat dgemm_kernel_NEOVERSEV1.s
 .text
 .p2align 2
 .global _dgemm_kernel_NEOVERSEV1

_dgemm_kernel_NEOVERSEV1:
 .align 5
 add sp, sp, #-(11 * 16)
 stp d8, d9, [sp, #(0 * 16)]
 stp d10, d11, [sp, #(1 * 16)]
 stp d12, d13, [sp, #(2 * 16)]
 stp d14, d15, [sp, #(3 * 16)]
 stp d16, d17, [sp, #(4 * 16)]
 stp x18, x19, [sp, #(5 * 16)]
 stp x20, x21, [sp, #(6 * 16)]
 stp x22, x23, [sp, #(7 * 16)]
 stp x24, x25, [sp, #(8 * 16)]
 stp x26, x27, [sp, #(9 * 16)]
 str x28, [sp, #(10 * 16)]

 fmov x18, d0
 dup z7.d, x18
    cntd x19
    lsl x20, x19, #1

 lsl x6, x6, #3
    ptrue p0.d

 mov x11, x4
 mov x10, x1
 asr x10, x10, #3

 .align 5
 mov x12, x5
    add x5, x5, x6, lsl #3
 mov x16, x3

 .align 5

 mov x11, x4
 dup z16.d, #0

 ret
fxcoudert commented 1 year ago

Now, I think there is no Apple hardware that supports SVE at this time. Not sure if that will change, and probably not high priority at all, but probably at some point it would be nice to allow compilation if the assembler allows it.

iains commented 1 year ago

I made no attempt to support SVE/SVE2 at this time - because of lack of h/w - I also have no idea if future versions of Apple h/w will use it. I guess supporting the asm output is nice from the PoV of test suites, but it is likely to be quite misleading if a user thinks it works (and then fails at runtime).

Probably the right solution is to make sure that there is a fallback in the package that uses capabilities of the actual h/w? (but I confess to not looking at any details so far).

GCC does support sve/sve2 so I presume we could enable it (if there is some compelling reason).

iains commented 1 year ago

since the assembler that GCC uses is, effectively, clang -cc1as, we could also (probably) cook up a spec that handles the +sve and passes it through to the assembler line. I guess the question is whether the +sve part of the march is visible at that stage.

iains commented 1 year ago

This patch takes some significant liberties (i.e. no configure checks etc.) on the basis that so far we have only used clang -cc1as as our assembler.

the other comments apply - i.e. we should really make sure that we emit code that the hardware can execute :)

edit: however, if this works in the circumstances you need and solves a problem (without breaking anything else) - we can push this into the branch.

diff --git a/gcc/config/aarch64/darwin.h b/gcc/config/aarch64/darwin.h
index aac8228dfea..4db34970c56 100644
--- a/gcc/config/aarch64/darwin.h
+++ b/gcc/config/aarch64/darwin.h
@@ -90,7 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 "%{!mkernel:%{!static:-fPIC}} " DARWIN_CC1_SPEC

 #undef ASM_SPEC
-#define ASM_SPEC "-arch %(darwin_arch) "\
+#define ASM_SPEC "-arch %(darwin_arch) %{march*} %{mtune*} "\
   ASM_OPTIONS " %{static} " ASM_MMACOSX_VERSION_MIN_SPEC

 #undef ENDFILE_SPEC
fxcoudert commented 1 year ago

@iains I have no real urgent need, we have a good workaround anyway, just wanted to flag the issue for consideration at some point

iains commented 1 year ago

@iains I have no real urgent need, we have a good workaround anyway, just wanted to flag the issue for consideration at some point

I'm looking for a good mechanism for deciding what to do in this case; we can (of course) check that the change produces no testsuite regressions - but in terms for whether it does anything actually useful, I will defer to you guys and any testing you are able to do.

iains commented 11 months ago

are you able to check whether the patch above works reliably for your cases ? (it seems a reasonable approach while we are using clang -cc1as as the assembler, if we changed to a different wrapper we'd need to revise the spec).

Actually splitting the march argument up would seem to require code; and so would need to be handled in the driver somewhere (but at the moment I do not see any pressing need to do so if this works for you).

iains commented 7 months ago

I think this is fixed for some time now; @fxcoudert if you agree please close this issue.