Add support for faster shuffles

rust-lang / packed_simd

Portable Packed SIMD Vectors for Rust standard library

Apache License 2.0

604 stars 74 forks source link

Open velvia opened 4 years ago

velvia commented 4 years ago

Currently u32x8 shuffle1_dyn are not optimized and fallback is used which results in a whole mess of extract intrinsics. It is not very fast.

Can we please add support for _mm256_permutevar8x32_epi32 and similar variants at the u32x8 (and f32x8, etc.) levels? It is a fairly large speedup.

Thanks

aldanor commented 3 years ago

Wondering about this as well (it's 30x slower than what it should be, without warning the user).

(should this be posted to stdsimd repo?)

Lokathor commented 3 years ago

Yes, all development has moved there.