Open reinerp opened 1 year ago
Actually looking through the implementation of bitmask.rs
, I see that in avx512 mode, Mask<T, LANES>
is just a wrapper for LANES::BitMask
, which is already accessible to users. So I guess the answer to my request is already available: it is "use LANES::BitMask
if you want a guarantee of single-bit-per-lane". Perhaps some additional documentation is all that is required.
It's worth noting that when compiling for instruction sets that use bitmasks, the compiler often optimizes these lane-width masks to bitmasks (as far as LLVM is concerned, all masks are truncated to bitmasks anyway). The layout matters less than you might think--only when writing masks to memory or general purpose registers to use them outside of SIMD operations, which is relatively rare.
That's a really great point Caleb, and I think it pretty much entirely addresses my concern. Thanks!
The choice between
full_masks.rs
andbitmask.rs
seems to be made at compile time (https://github.com/rust-lang/portable-simd/blob/9bd30e77b3a3c699af102ebb3df0f6110f8aa02e/crates/core_simd/src/masks.rs#L5) based on compile-time presence of theavx512f
target feature.When compiling for general x86-64 targets but doing runtime feature detection for
avx512f
, the current approach will generate suboptimal code for avx512, because it will use thefull_masks.rs
implementation instead of thebitmask.rs
implementation.I'd like to be able to use bitmasks in avx512 even when using runtime feature detection. It's probably out of scope (and, I think, undesirable) for
std::simd
to be responsible for runtime feature detection, so one option would be forstd::simd
to be refactored to expose bothMask
andBitmask
as two separate types -- perhaps both implementing the same set of traits. This way, callers could choose between a full-mask implementation and a bitmask implementation based on whatever logic they choose, including e.g. runtime feature detection.