Open hkratz opened 3 years ago
It looks like this is because the Rust compiler doesn't support inlining a function labeled with some target_feature(enable = "foo")
into another function labeled with another set of enabled features (unless all the features in foo
are enabled in compiler flags).
I've investigated this problem in this blog post, in particular this section on target_feature
and this section on what it means for recursion. See also similar issues in https://github.com/rust-lang/rust/issues/54353 and https://github.com/rust-lang/rust/issues/53069.
Concretely, the problem is that you're trying to inline a function labeled neon,v7
within a function labeled neon,aes,v8
. If I compile the first code with -C target-feature=+aes,+v7,+v8
, then the intrinsic is inlined.
#[inline] // This doesn't apply to a function labeled with other features.
#[target_feature(enable = "neon,v7")]
pub unsafe fn vld2_s64(a: *const i64) -> int64x1x2_t {
...
}
#[inline]
#[target_feature(enable = "neon,aes,v8"))]
pub unsafe fn vld2_p64(a: *const p64) -> poly64x1x2_t {
transmute(vld2_s64(transmute(a)))
}
#[target_feature(enable = "neon,aes,v8"))]
#[inline(never)]
pub unsafe fn vld2_p64_testshim(ptr: *const p64) -> poly64x1x2_t {
vld2_p64(ptr)
}
The following tests fail the inlining check on arm (but pass on aarch64):
Apparently we need an extra version of the load/store intrinsics we delegate those intrinsics to with matching feature flags, see https://godbolt.org/z/WxozeEPav