The implementation in core::arch for _mm512_set4_epi64 is
pub unsafe fn _mm512_set4_epi64(d: i64, c: i64, b: i64, a: i64) -> __m512i {
let r = i64x8::new(d, c, b, a, d, c, b, a);
transmute(r)
}
so the first argument provided becomes the first lane.
However, the Intel Intrinsics Guide defines it as
__m512i _mm512_set4_epi64 (__int64 d, __int64 c, __int64 b, __int64 a)
dst[63:0] := a
dst[127:64] := b
dst[191:128] := c
dst[255:192] := d
dst[319:256] := a
dst[383:320] := b
dst[447:384] := c
dst[511:448] := d
dst[MAX:512] := 0
which means that the last argument provided becomes the first lane.
The implementation for _mm512_set_epi64 is correct though, which leads to a disparity between _mm512_set4_epi64 and _mm512_set_epi64 that doesn't exist in C. I've created this gist to show this difference between C and Rust.
The implementation in
core::arch
for_mm512_set4_epi64
isso the first argument provided becomes the first lane. However, the Intel Intrinsics Guide defines it as
which means that the last argument provided becomes the first lane.
The implementation for
_mm512_set_epi64
is correct though, which leads to a disparity between_mm512_set4_epi64
and_mm512_set_epi64
that doesn't exist in C. I've created this gist to show this difference between C and Rust.