Closed fxcoudert closed 7 months ago
Now, I think there is no Apple hardware that supports SVE at this time. Not sure if that will change, and probably not high priority at all, but probably at some point it would be nice to allow compilation if the assembler allows it.
I made no attempt to support SVE/SVE2 at this time - because of lack of h/w - I also have no idea if future versions of Apple h/w will use it. I guess supporting the asm output is nice from the PoV of test suites, but it is likely to be quite misleading if a user thinks it works (and then fails at runtime).
Probably the right solution is to make sure that there is a fallback in the package that uses capabilities of the actual h/w? (but I confess to not looking at any details so far).
GCC does support sve/sve2 so I presume we could enable it (if there is some compelling reason).
since the assembler that GCC uses is, effectively, clang -cc1as
, we could also (probably) cook up a spec that handles the +sve
and passes it through to the assembler line. I guess the question is whether the +sve part of the march is visible at that stage.
This patch takes some significant liberties (i.e. no configure checks etc.) on the basis that so far we have only used clang -cc1as
as our assembler.
the other comments apply - i.e. we should really make sure that we emit code that the hardware can execute :)
edit: however, if this works in the circumstances you need and solves a problem (without breaking anything else) - we can push this into the branch.
diff --git a/gcc/config/aarch64/darwin.h b/gcc/config/aarch64/darwin.h
index aac8228dfea..4db34970c56 100644
--- a/gcc/config/aarch64/darwin.h
+++ b/gcc/config/aarch64/darwin.h
@@ -90,7 +90,7 @@ along with GCC; see the file COPYING3. If not see
"%{!mkernel:%{!static:-fPIC}} " DARWIN_CC1_SPEC
#undef ASM_SPEC
-#define ASM_SPEC "-arch %(darwin_arch) "\
+#define ASM_SPEC "-arch %(darwin_arch) %{march*} %{mtune*} "\
ASM_OPTIONS " %{static} " ASM_MMACOSX_VERSION_MIN_SPEC
#undef ENDFILE_SPEC
@iains I have no real urgent need, we have a good workaround anyway, just wanted to flag the issue for consideration at some point
@iains I have no real urgent need, we have a good workaround anyway, just wanted to flag the issue for consideration at some point
I'm looking for a good mechanism for deciding what to do in this case; we can (of course) check that the change produces no testsuite regressions - but in terms for whether it does anything actually useful, I will defer to you guys and any testing you are able to do.
are you able to check whether the patch above works reliably for your cases ? (it seems a reasonable approach while we are using clang -cc1as as the assembler, if we changed to a different wrapper we'd need to revise the spec).
Actually splitting the march argument up would seem to require code; and so would need to be handled in the driver somewhere (but at the moment I do not see any pressing need to do so if this works for you).
I think this is fixed for some time now; @fxcoudert if you agree please close this issue.
From OpenBLAS https://github.com/xianyi/OpenBLAS/issues/4212
We want to compile SVE-enabled code with
gcc -march=armv8.4-a+sve -mtune=neoverse-v1
. The full compilation line is:gcc-13 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=56 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.24\" -march=armv8.4-a+sve -mtune=neoverse-v1 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=_dgemm_kernel_NEOVERSEV1 -DASMFNAME=_dgemm_kernel_NEOVERSEV1_ -DNAME=dgemm_kernel_NEOVERSEV1_ -DCNAME=dgemm_kernel_NEOVERSEV1 -DCHAR_NAME=\"dgemm_kernel_NEOVERSEV1_\" -DCHAR_CNAME=\"dgemm_kernel_NEOVERSEV1\" -DNO_AFFINITY -DTS=_NEOVERSEV1 -I.. -DBUILD_KERNEL -DTABLE_NAME=gotoblas_NEOVERSEV1 -DDOUBLE -UCOMPLEX -c -DDOUBLE -UCOMPLEX ../kernel/arm64/dgemm_kernel_sve_v2x8.S -o dgemm_kernel_NEOVERSEV1.o
which in turns calls the assembler as:
clang -cc1as -triple arm64-apple-macosx13.0.0 -filetype obj -main-file-name ccMnB12p.s -target-cpu apple-m1 -target-feature +v8.5a -target-feature +crc -target-feature +lse -target-feature +rdm -target-feature +crypto -target-feature +dotprod -target-feature +fp-armv8 -target-feature +neon -target-feature +fp16fml -target-feature +ras -target-feature +rcpc -target-feature +zcm -target-feature +zcz -target-feature +fullfp16 -target-feature +sm4 -target-feature +sha3 -target-feature +sha2 -target-feature +aes -I .. -fdebug-compilation-dir=/private/tmp/openblas-20230905-12149-1dh80k/OpenBLAS-0.3.24/kernel -dwarf-debug-producer "Apple clang version 14.0.3 (clang-1403.0.22.14.1)" -I .. -dwarf-version=4 -mrelocation-model pic --mrelax-relocations -mllvm -disable-aligned-alloc-awareness=1 -o dgemm_kernel_NEOVERSEV1.o /private/tmp/ccMnB12p.s
which errors out with:
See that SVE was lost. Apple's clang, however, accepts SVE, and actually adding
-target-feature +sve
makes the assembler succeed.The source file in this case is (reduced):