llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.09k stars 12k forks source link

AArch64 target unconditionally generates SVE instructions for Armv9-A despite them being optional in the architecture #114987

Open willdeacon opened 1 week ago

willdeacon commented 1 week ago

Hi,

As of this commit, specifying an Armv9-A architecture will cause Clang to generate SVE instructions unconditionally. However, these instructions are OPTIONAL from version v8.2 of the architecture, as called out in the Arm ARM:

// ARM DDI 0487K.a, A2-105 FEAT_SVE is OPTIONAL from Armv8.2.

This is particularly problematic when running in a KVM guest environment, as SVE is disabled by default regardless of the underlying hardware capabilities and must be explicitly enabled by the VMM as an opt-in vCPU feature. Consequently, host binaries compiled with -march=armv9-a cannot execute in guest context on a v9 CPU unless the VMM enables SVE. Of course, these binaries would also fail to execute on a v9 CPU that chose not to implement SVE at all, but the KVM case is what we have run into in Android.

In addition to the above, there is a misleading "note" in the Arm ARM about SVE2 (which implies SVE) specifically:

// ARM DDI 0487K.a, A1-59 Note: All Armv8-A systems that support standard operating systems with rich application environments provide hardware support for Advanced SIMD and floating-point instructions. All Armv9-A systems that support standard operating systems with rich application environments also provide hardware support for SVE2 instructions. It is a requirement of the ARM Procedure Call Standard for AArch64, see Procedure Call Standard for the Arm 64-bit Architecture.

It's all very fluffy (who knows what a "rich application environment" really means), but the final sentence gives the wrong impression that the PCS requires support for SVE2. Although the PCS does require hardware support for fpsimd (see this footnote), SVE is still correctly referred to as an optional extension.

Looking back at an older version of the Arm ARM:

// ARM DDI 0487E.a, A1-51 Note: All systems that support standard operating systems with rich application environments provide hardware support for Advanced SIMD and floating-point. It is a requirement of the ARM Procedure Call Standard for AArch64, see Procedure Call Standard for the Arm 64-bit Architecture.

It seems plausible that the SVE2 text was shoe-horned in a little clumsily and the implication on the PCS was accidental.

Anyway, the tl;dr is that I don't think specifying an Armv9-A target architecture should assume the presence of SVE as this is not guaranteed by the CPU architecture and doesn't match the default behaviour of KVM. Instead, I think SVE should be specified explicitly as e.g. armv9-a+sve on the assumption that the user knows that they are generating non-portable binaries.

ktkachov commented 1 week ago

AFAIK this was a conscious choice in both LLVM and GCC for -march=armv9-a. The reasoning was that SVE2 is optional in Armv9-A as much as fpsimd are optional in the base Armv8-A ; technically optional but present in practice in all userspace uses and outliers that ship systems without it would have to explicitly account for it in their software stack e.g. through -march=armv9-a+nosve or similar. I guess if the KVM guest SVE use is indeed opt-in and is widespread enough that may be reason to reconsider, though I'd rather not as this has been a conscious choice for a few years. Would be good to get others input but I'll also add that there are many -mcpu options that enable SVE2 as it's important for performance. What if code is compiled that way running on such a CPU where SVE is not exposed to the guest? Another point is if we do change this we'd want to be consistent with GCC so this may need discussion there too

giordano commented 1 week ago

Related discussion in #95478

llvmbot commented 1 week ago

@llvm/issue-subscribers-backend-aarch64

Author: None (willdeacon)

Hi, As of [this commit](https://github.com/llvm/llvm-project/commit/3550e242fad672696da361f7ddadf53a41114dfd), specifying an Armv9-A architecture will cause Clang to generate SVE instructions unconditionally. However, these instructions are `OPTIONAL` from version v8.2 of the architecture, as called out in the [Arm ARM](https://developer.arm.com/documentation/ddi0487/ka/?lang=en): > // ARM DDI 0487K.a, A2-105 > FEAT_SVE is OPTIONAL from Armv8.2. This is particularly problematic when running in a KVM guest environment, as SVE is disabled by default regardless of the underlying hardware capabilities and must be explicitly enabled by the VMM as an [opt-in](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virt/kvm/api.rst#n3513) vCPU feature. Consequently, host binaries compiled with `-march=armv9-a` cannot execute in guest context on a v9 CPU unless the VMM enables SVE. Of course, these binaries would also fail to execute on a v9 CPU that chose not to implement SVE at all, but the KVM case is what we have run into in Android. In addition to the above, there is a misleading "note" in the Arm ARM about SVE2 (which implies SVE) specifically: > // ARM DDI 0487K.a, A1-59 > Note: > All Armv8-A systems that support standard operating systems with rich application environments provide hardware support for Advanced SIMD and floating-point instructions. **All Armv9-A systems that support standard operating systems with rich application environments also provide hardware support for SVE2 instructions.** It is a requirement of the ARM Procedure Call Standard for AArch64, see Procedure Call Standard for the Arm 64-bit Architecture. It's all very fluffy (who knows what a "rich application environment" really means), but the final sentence gives the wrong impression that the [PCS](https://github.com/ARM-software/abi-aa/blob/a82eef0433556b30539c0d4463768d9feb8cfd0b/aapcs64/aapcs64.rst) requires support for SVE2. Although the PCS does require hardware support for fpsimd (see [this footnote](https://github.com/ARM-software/abi-aa/blob/a82eef0433556b30539c0d4463768d9feb8cfd0b/aapcs64/aapcs64.rst#aapcs64-f1)), SVE is still correctly referred to as an [optional extension](https://github.com/ARM-software/abi-aa/blob/a82eef0433556b30539c0d4463768d9feb8cfd0b/aapcs64/aapcs64.rst#12appendix-support-for-scalable-vectors). Looking back at an older version of the Arm ARM: > // ARM DDI 0487E.a, A1-51 > Note: > All systems that support standard operating systems with rich application environments provide hardware support for Advanced SIMD and floating-point. It is a requirement of the ARM Procedure Call Standard for AArch64, see Procedure Call Standard for the Arm 64-bit Architecture. It seems plausible that the SVE2 text was shoe-horned in a little clumsily and the implication on the PCS was accidental. Anyway, the tl;dr is that I don't think specifying an Armv9-A target architecture should assume the presence of SVE as this is not guaranteed by the CPU architecture and doesn't match the default behaviour of KVM. Instead, I think SVE should be specified explicitly as e.g. `armv9-a+sve` on the assumption that the user knows that they are generating non-portable binaries.
willdeacon commented 1 week ago

Hi, Kyrill! Cheers for responding.

As I mentioned, the KVM case has bitten us in Android, so I think it would be great if you could reconsider the compiler behaviour here. Is there precedent for an AArch64 -march=armv*-a option that enables optional extensions by default?

aemerson commented 1 week ago

AFAIK this was a conscious choice in both LLVM and GCC for -march=armv9-a. The reasoning was that SVE2 is optional in Armv9-A as much as fpsimd are optional in the base Armv8-A

fpsimd wasn't optional from the beginning of ARMv8a, for quite a while it was mandatory so the software ecosystem had already established itself.

technically optional but present in practice in all userspace uses

I would be interested in seeing some data for this. From what I've heard some vendors have been using armv9 capable CPUs but disabling SVE for user space.[1][2] So to me this statement in the reference doesn't reflect the reality of the on-the-ground user experience. Is this untrue?

All Armv9-A systems that support standard operating systems with rich application environments also provide hardware support for SVE2 instructions.

... seems more like wishful thinking than anything else. If you're targeting Linux/Android and your program compiled with -march=armv9-a running on an ARMv9-a Cortex-X[N] core SIGILLs, what's the fix? Probably you will have to disable SVE.

[1] https://www.reddit.com/r/simd/comments/1c1ozs6/availability_of_sve_on_mobile_devices/ [2] https://community.arm.com/support-forums/f/high-performance-computing-forum/55659/are-there-any-socs-in-phones-with-sve-sve2