Open jacobbramley opened 10 months ago
@RalfJung, @Amanieu, @workingjubilee
IMO the declaration of add
is what should fail. You can't declare a function with an ABI that needs target features that are not available. trampoline
should compile just fine and call add
using the neon calling convention. (That's hard to implement in LLVM so the compiler might have to generate a shim.)
Having a type disappear based on a target feature does not make a ton of sense from a Rust perspective.
This is the AArch64 counterpart to https://github.com/rust-lang/rust/issues/116344 and https://github.com/rust-lang/rust/issues/114479
It's the counterpart to the second. The first issue is about the ABI of f32
/f64
being different when softfloat features are set (or hardfloat features are disabled); I assume ARM has its own version of that -- but this issue involves SIMD types, not (scalar) float types.
IMO the declaration of add is what should fail. You can't declare a function with an ABI that needs target features that are not available.
Ok, so perhaps the real problem here is that we can't add #[target_feature(...)]
to the declaration of add
. Otherwise, the only way to make target features available outside a function scope is on the command line, but that doesn't combine well with dynamic feature detection.
Having a type disappear based on a target feature does not make a ton of sense from a Rust perspective.
Is there more material I can read to get a better understanding of the reasoning behind that? From our perspective, trying to expose specific hardware features to low-level Rust code, it makes a lot of sense: uint32x4_t
isn't a generic (u32, u32, u32, u32)
, but rather a specific type that maps onto the Neon hardware.
Notably, having feature-specific types would go some way towards allowing traits implementations to have implied target features. For example, we can't implement Clone
(or anything else) for the prototyped SVE types currently.
Ok, so perhaps the real problem here is that we can't add #[target_feature(...)] to the declaration of add. Otherwise, the only way to make target features available outside a function scope is on the command line, but that doesn't combine well with dynamic feature detection.
Agreed, such attributes at declarations are needed. The original design didn't think they were needed since there is no codegen for declarations and target features seemingly only affect codegen, but alas, the situations is more messy than that.
Is there more material I can read to get a better understanding of the reasoning behind that? From our perspective, trying to expose specific hardware features to low-level Rust code, it makes a lot of sense: svint32x4_t isn't a generic (u32, u32, u32, u32), but rather a specific type that maps onto the Neon hardware.
Availability of Rust standard library types is determined by cfg
attributes that are evaluated when the standard library is built. It can't depend on -C
flags that are used when "the crate that imports the standard library" is built.
And even for C, how do you handle per-function enabling of target features? I could build a file without neon support but then declare one function in there to support neon. How do you make the type only available to that one function? That would require special compiler magic, the preprocessor does not suffice.
And even for C, how do you handle per-function enabling of target features? I could build a file without neon support but then declare one function in there to support neon. How do you make the type only available to that one function? That would require special compiler magic, the preprocessor does not suffice.
It appears that the types are actually available to the language, but if you try to use them with the hardware features disabled, you get a compiler error. That probably qualifies as special compiler magic!
With Neon and FP on AArch64 specifically, there are many caveats and corner-cases because most tools (reasonably) assume they're present. I experimented a bit with SVE, since it's genuinely optional. The following compiles fine without "+sve", using both Clang (build from source 3fc30ae297
) and GCC (13.2.Rel1):
#include <arm_sve.h>
__attribute__((target("arch=armv8-a+sve")))
svuint32_t add_sve(svuint32_t a, svuint32_t b) {
return svadd_x(svptrue_b32(), a, b);
}
In both cases, arguments are passed in SVE-specific z0
and z1
, but this is backwards-compatible because nothing can call it without handling an SVE type, and that's only possible in a context with "+sve". The quality of error messages varies, and there are a few corner-cases — Clang appears to allow calls to "+sve" functions that return SVE types, as long they as the result is unused — but this generally works intuitively, at least to me. It is always possible to call a "+sve" function that doesn't have SVE types in its prototype, specifically because the ABI is compatible for all other types.
I'd be happy if we could do that in Rust too, rather than falling back onto a different ABI. If someone is using these types, they're saying "I'm using SVE" (or Neon), so anything else is a surprising behaviour, I think.
There's another significant difference in C: it usually compiles and then links several compilation units, and it is easy to compile one or more modules with "+sve", and perhaps call it only after run-time feature checks. To do a similar thing in Rust, I think you'd have to put all the hardware-specific bits into a separate crate with --crate-type=lib
, but I've not experimented with doing that.
Availability of Rust standard library types is determined by
cfg
attributes that are evaluated when the standard library is built. It can't depend on-C
flags that are used when "the crate that imports the standard library" is built.
I had in mind something like this:
#[target_feature(require = "sve")]
#[repr(...)]
pub struct svuint32_t { ... }
... then the compiler can check, when the type is used, that the context provides the required target features.
We can fix it for -C
with build-std
, if we can stabilise that, but we still want to support mixed features in a single compiler invocation, for the use case of picking a fast path based on dynamically-detected features.
Compiled with
RUSTFLAGS=-Ctarget_feature=-neon
(foraarch64-unknown-linux-gnu
):Ideally,
trampoline
would fail to compile, because it does not have Neon and shouldn't be able to represent the vector types.trampoline(a, b)
passes the arguments in memory (using the Rust ABI).add(a, b)
tries to pass each argument in fourw
registers (each holding au32
), as if they are tuples(u32, u32, u32, u32)
.v0
andv1
), so the result is unpredictable.If
test()
— which has "neon" enabled — callsadd(a, b)
directly, it usesv0
andv1
, as per AAPCS64.This is the AArch64 counterpart to #116344 and #114479, with the twist that on AArch64, it's preferable for Neon-specific types to fail to compile without the proper features. These aren't general-purpose types. At least some C compilers refuse to compile code that uses Neon types when
-mcpu=+nosimd+nofp
is specified.Meta
This came out of a Zulip discussion.
rustc --version --verbose
: