Open newpavlov opened 3 years ago
I was under the impression that we'd deliberately removed the MMX stuff.
Indeed. It requires special handling in the compiler to emit the right type for MMX vectors as they are a different type from regular vectors. In addition it is pretty much impossible to use correctly as LLVM can reorder MMX usage before the intrinsic that enables MMX.
What about the streaming load intrinsics? Is there a reason why they have been omitted?
Some of the streaming ops are already in, and stabilized (eg: _mm_stream_pd).
Given this, I'd guess that any missing streaming ops are likely an oversight (at least for 128 or 256 bit).
_mm_broadcastsi128_si256
seems to be an alias for _mm256_broadcastsi128_si256
which is implemented. The intrinsics guide lists both as translating to the same instruction and with the same description.
_mm_malloc
and _mm_free
seem like they require implementing in libstd.
note that there are issues with these streaming intrinsics, as they have nontemporal hints that are not properly modelled. : https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Non-temporal.20stores
They've been converted into assembly.
Previous issue: #40
AVX2
_mm256_stream_load_si256
_mm_broadcastsi128_si256
MMX
EDIT(@workingjubilee): Direct MMX support is no longer in scope for
std::arch
, see:SSE
_mm_free
_mm_storeu_si16
_mm_loadu_si16
_mm_malloc
_mm_storeu_si64
SSE2
_mm_loadu_si32
_mm_storeu_si32
SSE4.1
_mm_stream_load_si128
Personally I am interested only in
_mm_stream_load_si128
and_mm256_stream_load_si256
, but I think it's worth to properly track all unimplemented intrinsics. Some of those intrinsics (e.g._mm_malloc
and_mm_free
) probably should not be exposed, but, in my opinion, motivation behind such decision should be explicitly recorded somewhere (ideally in comments of relevant source files).