rust-lang / portable-simd

The testing ground for the future of portable SIMD in Rust
Apache License 2.0
896 stars 80 forks source link

Support avx512 bitmasks with dynamic feature detection #332

Open reinerp opened 1 year ago

reinerp commented 1 year ago

The choice between full_masks.rs and bitmask.rs seems to be made at compile time (https://github.com/rust-lang/portable-simd/blob/9bd30e77b3a3c699af102ebb3df0f6110f8aa02e/crates/core_simd/src/masks.rs#L5) based on compile-time presence of the avx512f target feature.

When compiling for general x86-64 targets but doing runtime feature detection for avx512f, the current approach will generate suboptimal code for avx512, because it will use the full_masks.rs implementation instead of the bitmask.rs implementation.

I'd like to be able to use bitmasks in avx512 even when using runtime feature detection. It's probably out of scope (and, I think, undesirable) for std::simd to be responsible for runtime feature detection, so one option would be for std::simd to be refactored to expose both Mask and Bitmask as two separate types -- perhaps both implementing the same set of traits. This way, callers could choose between a full-mask implementation and a bitmask implementation based on whatever logic they choose, including e.g. runtime feature detection.

reinerp commented 1 year ago

Actually looking through the implementation of bitmask.rs, I see that in avx512 mode, Mask<T, LANES> is just a wrapper for LANES::BitMask, which is already accessible to users. So I guess the answer to my request is already available: it is "use LANES::BitMask if you want a guarantee of single-bit-per-lane". Perhaps some additional documentation is all that is required.

calebzulawski commented 1 year ago

It's worth noting that when compiling for instruction sets that use bitmasks, the compiler often optimizes these lane-width masks to bitmasks (as far as LLVM is concerned, all masks are truncated to bitmasks anyway). The layout matters less than you might think--only when writing masks to memory or general purpose registers to use them outside of SIMD operations, which is relatively rare.

reinerp commented 1 year ago

That's a really great point Caleb, and I think it pretty much entirely addresses my concern. Thanks!