rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.95k stars 12.68k forks source link

std::arch does not implement some Neon SIMD intrinsics #75373

Closed Nufflee closed 2 years ago

Nufflee commented 4 years ago

I tried this code:

#![feature(stdsimd)]
#![feature(arm_target_feature)]

extern crate core;

use std::arch::arm::*;

#[target_feature(enable = "neon")]
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
unsafe fn vmovmaskq_u8(input: uint8x16_t) -> i32 {
    // Example input (half scale):
    // 0x89 FF 1D C0 00 10 99 33

    // Shift out everything but the sign bits
    // 0x01 01 00 01 00 00 01 00
    let high_bits = vreinterpretq_u16_u8(vshrq_n_u8(input, 7));

    // Merge the even lanes together with vsra. The '??' bytes are garbage.
    // vsri could also be used, but it is slightly slower on aarch64.
    // 0x??03 ??02 ??00 ??01
    let paired16 = vreinterpretq_u32_u16(vsraq_n_u16(high_bits, high_bits, 7));
    // Repeat with wider lanes.
    // 0x??????0B ??????04
    let paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14));
    // 0x??????????????4B
    let paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28));
    // Extract the low 8 bits from each lane and join.
    // 0x4B
    return vgetq_lane_u8(paired64, 0) | (vgetq_lane_u8(paired64, 8) << 8);
}

Godbolt: https://godbolt.org/z/1no41v

I expected to see this happen: Intrinsics found without compile errors

Instead, this happened: Compile errors about Neon functions and types not found. I was able to reproduce this same issue on a Raspberry Pi 3 with a ARM v8 CPU running 32-bit Raspbian. Keep in mind that these intrinsics are supported on ARM v7 and some of them, like vreinterpretq_u32_u16 are basically just reinterpret casts.

Edit: upon closer inspection, I realized that most of the intrinsics I am using are not even supported by Rust. I should still be able to use vreinterpretq_u32_u16 because it is a part of the Rust STD.

Meta

rustc --version --verbose:

rustc 1.47.0-nightly (6c8927b0c 2020-07-26)
binary: rustc
commit-hash: 6c8927b0cf80ceee19386026cf9d7fd4fd9d486f
commit-date: 2020-07-26
host: armv7-unknown-linux-gnueabihf
release: 1.47.0-nightly
LLVM version: 10.0
Backtrace

``` error[E0412]: cannot find type `uint8x16_t` in this scope --> :10:31 | 10 | unsafe fn vmovmaskq_u8(input: uint8x16_t) -> i32 { | ^^^^^^^^^^ help: a struct with a similar name exists: `uint8x4_t` error[E0425]: cannot find function `vreinterpretq_u16_u8` in this scope --> :16:21 | 16 | let high_bits = vreinterpretq_u16_u8(vshrq_n_u8(input, 7)); | ^^^^^^^^^^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vshrq_n_u8` in this scope --> :16:42 | 16 | let high_bits = vreinterpretq_u16_u8(vshrq_n_u8(input, 7)); | ^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vreinterpretq_u32_u16` in this scope --> :21:20 | 21 | let paired16 = vreinterpretq_u32_u16(vsraq_n_u16(high_bits, high_bits, 7)); | ^^^^^^^^^^^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vsraq_n_u16` in this scope --> :21:42 | 21 | let paired16 = vreinterpretq_u32_u16(vsraq_n_u16(high_bits, high_bits, 7)); | ^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vreinterpretq_u64_u32` in this scope --> :24:20 | 24 | let paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14)); | ^^^^^^^^^^^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vsraq_n_u32` in this scope --> :24:42 | 24 | let paired32 = vreinterpretq_u64_u32(vsraq_n_u32(paired16, paired16, 14)); | ^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vreinterpretq_u8_u64` in this scope --> :26:20 | 26 | let paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28)); | ^^^^^^^^^^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vsraq_n_u64` in this scope --> :26:41 | 26 | let paired64 = vreinterpretq_u8_u64(vsraq_n_u64(paired32, paired32, 28)); | ^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vgetq_lane_u8` in this scope --> :29:12 | 29 | return vgetq_lane_u8(paired64, 0) | (vgetq_lane_u8(paired64, 8) << 8); | ^^^^^^^^^^^^^ not found in this scope error[E0425]: cannot find function `vgetq_lane_u8` in this scope --> :29:42 | 29 | return vgetq_lane_u8(paired64, 0) | (vgetq_lane_u8(paired64, 8) << 8); | ^^^^^^^^^^^^^ not found in this scope warning: unused import: `std::arch::arm::*` --> :6:5 | 6 | use std::arch::arm::*; | ^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default error: aborting due to 11 previous errors; 1 warning emitted Some errors have detailed explanations: E0412, E0425. For more information about an error, try `rustc --explain E0412`. Compiler returned: 1 ```

cc @gnzlbg

JayKickliter commented 4 years ago

I just now ran into this same exact problem. I was able to get sse/avx version of my code to compile, but not neon. The docs barely cover this topic at all, so I'm just shooting in the dark.

Nufflee commented 4 years ago

Yep, SSE and AVX works without issues but Neon refuses to work.

workingjubilee commented 4 years ago

@rustbot modify labels: +A-simd, +O-ARM

workingjubilee commented 4 years ago

I had a moment to review this issue today in closer detail.

vreinterpretq_u32_u16 is not present. vreinterpretq_u32_u8 is. vgetq_lane_u8 is not present. vget_lane_u8 is.

In other words, this is not a compiler error, or at least, not the compiler error it is suggested to be. This is merely the absence of very common intrinsics for the ARM platform.

bluss commented 3 years ago

The "place to go" to implement these instrinsics is rust-lang/stdarch/issues/148

workingjubilee commented 2 years ago

This example compiles now. I am assuming all the intrinsics were implemented, and probably any need for further intrinsic requests can go to stdarch? I am closing this. I think it's fine to reopen if you can find anything missing, though!