Neon types generate bad code if the "neon" target feature is disabled.

jacobbramley commented 10 months ago

Compiled with RUSTFLAGS=-Ctarget_feature=-neon (for aarch64-unknown-linux-gnu):

#![feature(simd_ffi)]

use std::arch::aarch64::*;

fn main() {
    // The target_feature unsafety contract requires us to test this first.
    if std::arch::is_aarch64_feature_detected!("neon") {
        unsafe { test(); }
    }
}

#[target_feature(enable = "neon")]
unsafe fn test() {
    const A: [u32; 4] = [40, 30, 16, 9];
    const B: [u32; 4] = [2, 12, 26, 33];
    let a: uint32x4_t = vld1q_u32(A.as_ptr());
    let b: uint32x4_t = vld1q_u32(B.as_ptr());
    let r = trampoline(a, b);
    println!("{a:?} + {b:?} -> {r:?}");
}

fn trampoline(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
    unsafe { add(a, b) }
}

extern "C" {
    // The C implementation is a simple pass-through to `vaddq_u32(a, b)`.
    fn add(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t;
}

Ideally, trampoline would fail to compile, because it does not have Neon and shouldn't be able to represent the vector types.

The call to trampoline(a, b) passes the arguments in memory (using the Rust ABI).
The subsequent call to add(a, b) tries to pass each argument in four w registers (each holding a u32), as if they are tuples (u32, u32, u32, u32).
The C implementation expects arguments in Neon registers (v0 and v1), so the result is unpredictable.

If test() — which has "neon" enabled — calls add(a, b) directly, it uses v0 and v1, as per AAPCS64.

This is the AArch64 counterpart to #116344 and #114479, with the twist that on AArch64, it's preferable for Neon-specific types to fail to compile without the proper features. These aren't general-purpose types. At least some C compilers refuse to compile code that uses Neon types when -mcpu=+nosimd+nofp is specified.

Meta

This came out of a Zulip discussion.

rustc --version --verbose:

rustc 1.76.0-nightly (a1a37735c 2023-11-23)
binary: rustc
commit-hash: a1a37735cbc3db359d0b24ba9085c9fcbe1bc274
commit-date: 2023-11-23
host: x86_64-unknown-linux-gnu
release: 1.76.0-nightly
LLVM version: 17.0.5

jacobbramley commented 10 months ago

@RalfJung, @Amanieu, @workingjubilee

RalfJung commented 10 months ago

IMO the declaration of add is what should fail. You can't declare a function with an ABI that needs target features that are not available. trampoline should compile just fine and call add using the neon calling convention. (That's hard to implement in LLVM so the compiler might have to generate a shim.)

Having a type disappear based on a target feature does not make a ton of sense from a Rust perspective.

RalfJung commented 10 months ago

This is the AArch64 counterpart to https://github.com/rust-lang/rust/issues/116344 and https://github.com/rust-lang/rust/issues/114479

It's the counterpart to the second. The first issue is about the ABI of f32/f64 being different when softfloat features are set (or hardfloat features are disabled); I assume ARM has its own version of that -- but this issue involves SIMD types, not (scalar) float types.

jacobbramley commented 9 months ago

IMO the declaration of add is what should fail. You can't declare a function with an ABI that needs target features that are not available.

Ok, so perhaps the real problem here is that we can't add #[target_feature(...)] to the declaration of add. Otherwise, the only way to make target features available outside a function scope is on the command line, but that doesn't combine well with dynamic feature detection.

Having a type disappear based on a target feature does not make a ton of sense from a Rust perspective.

Is there more material I can read to get a better understanding of the reasoning behind that? From our perspective, trying to expose specific hardware features to low-level Rust code, it makes a lot of sense: uint32x4_t isn't a generic (u32, u32, u32, u32), but rather a specific type that maps onto the Neon hardware.

Notably, having feature-specific types would go some way towards allowing traits implementations to have implied target features. For example, we can't implement Clone (or anything else) for the prototyped SVE types currently.

RalfJung commented 9 months ago

Ok, so perhaps the real problem here is that we can't add #[target_feature(...)] to the declaration of add. Otherwise, the only way to make target features available outside a function scope is on the command line, but that doesn't combine well with dynamic feature detection.

Agreed, such attributes at declarations are needed. The original design didn't think they were needed since there is no codegen for declarations and target features seemingly only affect codegen, but alas, the situations is more messy than that.

Is there more material I can read to get a better understanding of the reasoning behind that? From our perspective, trying to expose specific hardware features to low-level Rust code, it makes a lot of sense: svint32x4_t isn't a generic (u32, u32, u32, u32), but rather a specific type that maps onto the Neon hardware.

Availability of Rust standard library types is determined by cfg attributes that are evaluated when the standard library is built. It can't depend on -C flags that are used when "the crate that imports the standard library" is built.

And even for C, how do you handle per-function enabling of target features? I could build a file without neon support but then declare one function in there to support neon. How do you make the type only available to that one function? That would require special compiler magic, the preprocessor does not suffice.

jacobbramley commented 9 months ago

And even for C, how do you handle per-function enabling of target features? I could build a file without neon support but then declare one function in there to support neon. How do you make the type only available to that one function? That would require special compiler magic, the preprocessor does not suffice.

It appears that the types are actually available to the language, but if you try to use them with the hardware features disabled, you get a compiler error. That probably qualifies as special compiler magic!

With Neon and FP on AArch64 specifically, there are many caveats and corner-cases because most tools (reasonably) assume they're present. I experimented a bit with SVE, since it's genuinely optional. The following compiles fine without "+sve", using both Clang (build from source 3fc30ae297) and GCC (13.2.Rel1):

#include <arm_sve.h>

__attribute__((target("arch=armv8-a+sve")))
svuint32_t add_sve(svuint32_t a, svuint32_t b) {
  return svadd_x(svptrue_b32(), a, b);
}

In both cases, arguments are passed in SVE-specific z0 and z1, but this is backwards-compatible because nothing can call it without handling an SVE type, and that's only possible in a context with "+sve". The quality of error messages varies, and there are a few corner-cases — Clang appears to allow calls to "+sve" functions that return SVE types, as long they as the result is unused — but this generally works intuitively, at least to me. It is always possible to call a "+sve" function that doesn't have SVE types in its prototype, specifically because the ABI is compatible for all other types.

I'd be happy if we could do that in Rust too, rather than falling back onto a different ABI. If someone is using these types, they're saying "I'm using SVE" (or Neon), so anything else is a surprising behaviour, I think.

There's another significant difference in C: it usually compiles and then links several compilation units, and it is easy to compile one or more modules with "+sve", and perhaps call it only after run-time feature checks. To do a similar thing in Rust, I think you'd have to put all the hardware-specific bits into a separate crate with --crate-type=lib, but I've not experimented with doing that.

jacobbramley commented 9 months ago

Availability of Rust standard library types is determined by cfg attributes that are evaluated when the standard library is built. It can't depend on -C flags that are used when "the crate that imports the standard library" is built.

I had in mind something like this:

#[target_feature(require = "sve")]
#[repr(...)]
pub struct svuint32_t { ... }

... then the compiler can check, when the type is used, that the context provides the required target features.

We can fix it for -C with build-std, if we can stabilise that, but we still want to support mixed features in a single compiler invocation, for the use case of picking a fast path based on dynamically-detected features.

rust-lang / rust

Neon types generate bad code if the "neon" target feature is disabled. #118249

Meta