Open Amanieu opened 1 year ago
@rustbot label O-x86
Hello, what are the guidelines to potentially contribute intrinsics?
Cheers
Currently the main blocker for stabilizing AVX-512 intrinsics is that we are still missing some. See these files for the list of missing intrinsics:
There may also be missing intrinsics for some of the other AVX512 subsets, this should be double-checked.
It seems like most of the intrinsics that are not yet implemented are labeled not in LLVM. Is stabilization blocked on those, or just the ones labeled need i1
?
The documents were made quite a few years ago, and should be checked against the equivalent intrinsics in the latest version of Clang.
Regarding the "not in llvm", we can skip these since they are supported by neither Clang nor GCC. It seems these are only supported by icc for Xeon Phi targets.
Not sure if this is a good place to ask, but I'm curious if there are any blockers for stabilizing avx512_target_feature
, or it just needs a stabilization PR.
I previously asked here without a reply: https://github.com/rust-lang/rust/issues/44839#issuecomment-1883036505
Not sure if this is a good place to ask, but I'm curious if there are any blockers for stabilizing
avx512_target_feature
, or it just needs a stabilization PR.
Yes, this is the right place to ask: essentially this is blocked on the AVX512 baseline intrinsics still being incomplete, see my comment above.
what is considered baseline ?
I see that e.g. _mm512_cvtt_roundpd_epi64 from AVX512DQ is not available today and I don't see an axv512dq.md file in the core arch dir
I would consider F + VL/DQ/BW as the baseline for initial stabilization of AVX512 intrinsics. The MD files may be somewhat out of date and need someone to double-check against the full list of intrinsics.
We should resolve https://github.com/rust-lang/stdarch/issues/1533 before stabilizing these intrinsics.
We also need to consider how this interacts with AVX10 now. In https://github.com/rust-lang/rust/pull/121088 I made all the +avx512
target features imply +evex512
to restore the status quo, but this means that there is currently no way to support AVX10.N/256
. We'll presumably want to figure out some way to support that before avx512 support is stabilized. Possibly by explicitly adding +evex512
to all avx512 intrinsics that use 512-vectors (and having the same requirement for user code).
A dumb question, since this appears to be blocked on some cpu instructions not having a corresponding wrapper function due to downstream compilers not supporting them yet, why not stabilize it peacemeal? The instructions that are already implemented (provided that they do work as advertised) would already help me out a lot. I dont really see the need why all avx512 instruction wrappers need to be stabilized at the same time.
Here is a more updated list of what is missing in stdarch:
# in llvm-project
llvm_512f=$(rg '(?s:static __inline.*?(?P<fn_name>[a-z0-9_]+?)\s*\(|#define (?P<def_name>[a-z0-9_]+)\()' --only-matching --multiline --no-filename -r '$fn_name$def_name' --color=auto clang/lib/Headers/avx512fintrin.h clang/lib/Headers/avx512vlintrin.h | sort)
llvm_512bw=$(rg '(?s:static __inline.*?(?P<fn_name>[a-z0-9_]+?)\s*\(|#define (?P<def_name>[a-z0-9_]+)\()' --only-matching --multiline --no-filename -r '$fn_name$def_name' --color=auto clang/lib/Headers/avx512bwintrin.h | sort)
# in stdarch
stdarch_512f=$(rg 'pub unsafe fn (\w+)' --only-matching -r '$1' --color=auto crates/core_arch/src/x86/avx512f.rs | sort)
stdarch_512bw=$(rg 'pub unsafe fn (\w+)' --only-matching -r '$1' --color=auto crates/core_arch/src/x86/avx512bw.rs | sort)
# Find everything only in llvm but not rust
missing_f=$(echo "$llvm_512f$stdarch_512f" | sort | uniq --unique)
missing_bw=$(echo "$llvm_512bw$stdarch_512bw" | sort | uniq --unique)
# print things that aren't mentioned at all in stdarch
echo "$missing_f" | xargs -IINAME sh -c 'if ! rg INAME > /dev/null ; then echo INAME; fi'
echo "$missing_bw" | xargs -IINAME sh -c 'if ! rg INAME > /dev/null ; then echo INAME; fi'
The results are:
Not mentioned avx512bw intrinsics:
_store_mask64_kadd_mask32
It looks like we're also missing _mm512_fpclass_ps_mask
and mm512_fpclass_pd_mask
, which are in the AVX-512DQ extension.
The untracked features "avx512er"
and "avx512pf"
have been removed. You probably weren't using them. I'm only mentioning them here in case someone gets confused and wonders where they went and looks here. These were only implemented by Knight's Landing, so most AVX512-enabled CPUs didn't have them.
We really need to upgrade the intrinsics list. Intel has since removed all the extgather
, logather
etc intrinsics (so avx512f.rs
is almost complete now), and added the new AMX
family, VEX variants of AVX512, and some more instruction sets.
Who is in "charge" of that question on the rust project side ? It seem a lot of people have changes to the intrinsics lists to contribute but it does not seem like it was updated recently ?
I am working on a PR to update many aspects of stdarch, including the intrinsics list (rust-lang/stdarch#1594)
awesome 🙏
Generally this is libs team territory - or rather libs-api, I assume, since this is user-visible API. Sadly that team is particularly understaffed. The intrinsics are exposed via the stdarch module, for which @Amanieu seems to be the sole maintainer.
The usual process for API questions is to file an ACP but I do not know whether stdarch also uses that process.
We don't use ACPs for stdarch because we don't invent our own APIs and instead follow existing C APIs for arch-specific intrinsics.
Feature gate:
#![feature(stdarch_x86_avx512)]
This is a tracking issue for the AVX-512 (and related extensions) intrinsics in
core::arch
.Public API
This feature covers all of the intrinsics from the following features:
avx512bf16
avx512bitalg
avx512bw
avx512cd
avx512f
avx512ifma
avx512vbmi
avx512vbmi2
avx512vnni
avx512vpopcntdq
gfni
vaes
vpclmulqdq
VEX variants
avxifma
avxneconvert
avxvnni
avxvnniint16
avxvnniint8
Implementation History
Steps
Unresolved Questions
https://github.com/rust-lang/stdarch/issues/1533