rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.94k stars 12.24k forks source link

Tracking Issue for AVX512 intrinsics #111137

Open Amanieu opened 1 year ago

Amanieu commented 1 year ago

Feature gate: #![feature(stdarch_x86_avx512)]

This is a tracking issue for the AVX-512 (and related extensions) intrinsics in core::arch.

Public API

This feature covers all of the intrinsics from the following features:

VEX variants

Implementation History

Steps

Unresolved Questions

Jules-Bertholet commented 1 year ago

@rustbot label O-x86

IceTDrinker commented 9 months ago

Hello, what are the guidelines to potentially contribute intrinsics?

Cheers

Amanieu commented 8 months ago

Currently the main blocker for stabilizing AVX-512 intrinsics is that we are still missing some. See these files for the list of missing intrinsics:

There may also be missing intrinsics for some of the other AVX512 subsets, this should be double-checked.

tgross35 commented 7 months ago

It seems like most of the intrinsics that are not yet implemented are labeled not in LLVM. Is stabilization blocked on those, or just the ones labeled need i1?

Amanieu commented 7 months ago

The documents were made quite a few years ago, and should be checked against the equivalent intrinsics in the latest version of Clang.

Regarding the "not in llvm", we can skip these since they are supported by neither Clang nor GCC. It seems these are only supported by icc for Xeon Phi targets.

tarcieri commented 5 months ago

Not sure if this is a good place to ask, but I'm curious if there are any blockers for stabilizing avx512_target_feature, or it just needs a stabilization PR.

I previously asked here without a reply: https://github.com/rust-lang/rust/issues/44839#issuecomment-1883036505

Amanieu commented 5 months ago

Not sure if this is a good place to ask, but I'm curious if there are any blockers for stabilizing avx512_target_feature, or it just needs a stabilization PR.

Yes, this is the right place to ask: essentially this is blocked on the AVX512 baseline intrinsics still being incomplete, see my comment above.

IceTDrinker commented 4 months ago

what is considered baseline ?

I see that e.g. _mm512_cvtt_roundpd_epi64 from AVX512DQ is not available today and I don't see an axv512dq.md file in the core arch dir

Amanieu commented 4 months ago

I would consider F + VL/DQ/BW as the baseline for initial stabilization of AVX512 intrinsics. The MD files may be somewhat out of date and need someone to double-check against the full list of intrinsics.

RalfJung commented 4 months ago

We should resolve https://github.com/rust-lang/stdarch/issues/1533 before stabilizing these intrinsics.

nikic commented 4 months ago

We also need to consider how this interacts with AVX10 now. In https://github.com/rust-lang/rust/pull/121088 I made all the +avx512 target features imply +evex512 to restore the status quo, but this means that there is currently no way to support AVX10.N/256. We'll presumably want to figure out some way to support that before avx512 support is stabilized. Possibly by explicitly adding +evex512 to all avx512 intrinsics that use 512-vectors (and having the same requirement for user code).

AlexanderSchuetz97 commented 3 months ago

A dumb question, since this appears to be blocked on some cpu instructions not having a corresponding wrapper function due to downstream compilers not supporting them yet, why not stabilize it peacemeal? The instructions that are already implemented (provided that they do work as advertised) would already help me out a lot. I dont really see the need why all avx512 instruction wrappers need to be stabilized at the same time.

tgross35 commented 2 months ago

Here is a more updated list of what is missing in stdarch:

# in llvm-project
llvm_512f=$(rg '(?s:static __inline.*?(?P<fn_name>[a-z0-9_]+?)\s*\(|#define (?P<def_name>[a-z0-9_]+)\()' --only-matching --multiline  --no-filename -r '$fn_name$def_name' --color=auto clang/lib/Headers/avx512fintrin.h clang/lib/Headers/avx512vlintrin.h | sort)
llvm_512bw=$(rg '(?s:static __inline.*?(?P<fn_name>[a-z0-9_]+?)\s*\(|#define (?P<def_name>[a-z0-9_]+)\()' --only-matching --multiline  --no-filename -r '$fn_name$def_name' --color=auto clang/lib/Headers/avx512bwintrin.h | sort)

# in stdarch
stdarch_512f=$(rg 'pub unsafe fn (\w+)' --only-matching -r '$1' --color=auto crates/core_arch/src/x86/avx512f.rs | sort)
stdarch_512bw=$(rg 'pub unsafe fn (\w+)' --only-matching -r '$1' --color=auto crates/core_arch/src/x86/avx512bw.rs | sort)

# Find everything only in llvm but not rust
missing_f=$(echo "$llvm_512f$stdarch_512f" | sort | uniq --unique)
missing_bw=$(echo "$llvm_512bw$stdarch_512bw" | sort | uniq --unique)

# print things that aren't mentioned at all in stdarch
echo "$missing_f" | xargs -IINAME sh -c 'if ! rg INAME > /dev/null ; then echo INAME; fi'
echo "$missing_bw" | xargs -IINAME sh -c 'if ! rg INAME > /dev/null ; then echo INAME; fi'

The results are:

Missing avx512f intrinsics ``` _cvtmask16_u32 _cvtu32_mask16 _kandn_mask16 _knot_mask16 _kor_mask16 _kortest_mask16_u8 _kortestc_mask16_u8 _kortestz_mask16_u8 _kshiftli_mask16 _kshiftri_mask16 _kxnor_mask16 _kxor_mask16 _load_mask16 _mm256_and_epi32 _mm256_and_epi64 _mm256_andnot_epi32 _mm256_andnot_epi64 _mm256_cvtepu32_ps _mm256_i32scatter_epi32 _mm256_i32scatter_pd _mm256_i32scatter_ps _mm256_i64scatter_epi32 _mm256_i64scatter_epi64 _mm256_i64scatter_pd _mm256_i64scatter_ps _mm256_mask_cvtepu32_ps _mm256_mask_cvtps_pd _mm256_mask_cvtps_ph _mm256_mask_i32scatter_epi32 _mm256_mask_i32scatter_epi64 _mm256_mask_i32scatter_pd _mm256_mask_i32scatter_ps _mm256_mask_i64scatter_epi32 _mm256_mask_i64scatter_epi64 _mm256_mask_i64scatter_pd _mm256_mask_i64scatter_ps _mm256_maskz_cvtepu32_ps _mm256_maskz_cvtps_pd _mm256_maskz_cvtps_ph _mm256_mmask_i32gather_epi32 _mm256_mmask_i32gather_epi64 _mm256_mmask_i32gather_pd _mm256_mmask_i32gather_ps _mm256_mmask_i64gather_epi32 _mm256_mmask_i64gather_epi64 _mm256_mmask_i64gather_pd _mm256_mmask_i64gather_ps _mm256_rsqrt14_pd _mm256_rsqrt14_ps _mm512_ceil_pd _mm512_ceil_ps _mm512_cvtph_ps _mm512_cvtps_ph _mm512_cvtsd_f64 _mm512_cvtss_f32 _mm512_floor_pd _mm512_floor_ps _mm512_i32logather_epi64 _mm512_i32logather_pd _mm512_i32loscatter_epi64 _mm512_i32loscatter_pd _mm512_kortestz _mm512_mask_ceil_pd _mm512_mask_ceil_ps _mm512_mask_cvtps_ph _mm512_mask_floor_pd _mm512_mask_floor_ps _mm512_mask_i32logather_epi64 _mm512_mask_i32logather_pd _mm512_mask_i32loscatter_epi64 _mm512_mask_i32loscatter_pd _mm512_mask_permutevar_epi32 _mm512_mask_sqrt_ps _mm512_maskz_cvtps_ph _mm512_maskz_sqrt_ps _mm512_max_pd _mm512_max_ps _mm512_min_pd _mm512_min_ps _mm512_permutevar_epi32 _mm512_rcp14_pd _mm512_rcp14_ps _mm512_rsqrt14_pd _mm512_rsqrt14_ps _mm512_set_epi16 _mm512_set_epi8 _mm512_setzero _mm512_setzero_epi32 _mm512_setzero_pd _mm512_setzero_si512 _mm512_sqrt_pd _mm512_sqrt_ps _mm512_stream_load_si512 _mm_abs_epi64 _mm_and_epi32 _mm_and_epi64 _mm_andnot_epi32 _mm_andnot_epi64 _mm_cvt_roundi64_sd _mm_cvt_roundi64_ss _mm_cvt_roundsd_i64 _mm_cvt_roundsd_si64 _mm_cvt_roundsd_u64 _mm_cvt_roundsi64_sd _mm_cvt_roundsi64_ss _mm_cvt_roundss_i64 _mm_cvt_roundss_si64 _mm_cvt_roundss_u64 _mm_cvt_roundu64_sd _mm_cvt_roundu64_ss _mm_cvtepu32_ps _mm_cvti32_sd _mm_cvti32_ss _mm_cvtsd_i32 _mm_cvtsd_u64 _mm_cvtss_i32 _mm_cvtss_u64 _mm_cvtt_roundsd_i64 _mm_cvtt_roundsd_si64 _mm_cvtt_roundsd_u64 _mm_cvtt_roundss_i64 _mm_cvtt_roundss_si64 _mm_cvtt_roundss_u64 _mm_cvttsd_i64 _mm_cvttsd_u64 _mm_cvttss_i64 _mm_cvttss_u64 _mm_cvtu64_sd _mm_cvtu64_ss _mm_i32scatter_epi32 _mm_i32scatter_epi64 _mm_i32scatter_pd _mm_i32scatter_ps _mm_i64scatter_epi32 _mm_i64scatter_epi64 _mm_i64scatter_pd _mm_i64scatter_ps _mm_mask_abs_epi64 _mm_mask_cvtepu32_ps _mm_mask_cvtps_pd _mm_mask_cvtps_ph _mm_mask_i32scatter_epi32 _mm_mask_i32scatter_epi64 _mm_mask_i32scatter_pd _mm_mask_i32scatter_ps _mm_mask_i64scatter_epi32 _mm_mask_i64scatter_epi64 _mm_mask_i64scatter_pd _mm_mask_i64scatter_ps _mm_mask_load_sd _mm_mask_load_ss _mm_mask_min_epi64 _mm_mask_store_sd _mm_mask_store_ss _mm_maskz_abs_epi64 _mm_maskz_cvtepu32_ps _mm_maskz_cvtps_pd _mm_maskz_cvtps_ph _mm_maskz_load_sd _mm_maskz_load_ss _mm_maskz_min_epi64 _mm_min_epi64 _mm_mmask_i32gather_epi32 _mm_mmask_i32gather_epi64 _mm_mmask_i32gather_pd _mm_mmask_i32gather_ps _mm_mmask_i64gather_epi32 _mm_mmask_i64gather_epi64 _mm_mmask_i64gather_pd _mm_mmask_i64gather_ps _mm_rcp14_sd _mm_rcp14_ss _mm_rsqrt14_pd _mm_rsqrt14_ps _mm_rsqrt14_sd _mm_rsqrt14_ss _store_mask16_kand_mask16 ```
Missing avx512bw intrinsics ``` _cvtmask32_u32 _cvtmask64_u64 _cvtu32_mask32 _cvtu64_mask64 _kadd_mask32 _kortest_mask32_u8 _kortest_mask64_u8 _kortestc_mask32_u8 _kortestc_mask64_u8 _kortestz_mask32_u8 _kortestz_mask64_u8 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask32 _kshiftri_mask64 _ktest_mask32_u8 _ktest_mask64_u8 _ktestc_mask32_u8 _ktestc_mask64_u8 _ktestz_mask32_u8 _ktestz_mask64_u8 _mm256_cmp_epi16_mask _mm256_cmp_epi8_mask _mm256_cmp_epu16_mask _mm256_cmp_epu8_mask _mm256_cmpeq_epi16_mask _mm256_cmpeq_epi8_mask _mm256_cmpeq_epu16_mask _mm256_cmpeq_epu8_mask _mm256_cmpge_epi16_mask _mm256_cmpge_epi8_mask _mm256_cmpge_epu16_mask _mm256_cmpge_epu8_mask _mm256_cmpgt_epi16_mask _mm256_cmpgt_epi8_mask _mm256_cmpgt_epu16_mask _mm256_cmpgt_epu8_mask _mm256_cmple_epi16_mask _mm256_cmple_epi8_mask _mm256_cmple_epu16_mask _mm256_cmple_epu8_mask _mm256_cmplt_epi16_mask _mm256_cmplt_epi8_mask _mm256_cmplt_epu16_mask _mm256_cmplt_epu8_mask _mm256_cmpneq_epi16_mask _mm256_cmpneq_epi8_mask _mm256_cmpneq_epu16_mask _mm256_cmpneq_epu8_mask _mm256_cvtepi16_epi8 _mm256_cvtsepi16_epi8 _mm256_cvtusepi16_epi8 _mm256_dbsad_epu8 _mm256_loadu_epi16 _mm256_loadu_epi8 _mm256_mask2_permutex2var_epi16 _mm256_mask_abs_epi16 _mm256_mask_abs_epi8 _mm256_mask_add_epi16 _mm256_mask_add_epi8 _mm256_mask_adds_epi16 _mm256_mask_adds_epi8 _mm256_mask_adds_epu16 _mm256_mask_adds_epu8 _mm256_mask_alignr_epi8 _mm256_mask_avg_epu16 _mm256_mask_avg_epu8 _mm256_mask_blend_epi16 _mm256_mask_blend_epi8 _mm256_mask_broadcastb_epi8 _mm256_mask_broadcastw_epi16 _mm256_mask_cmp_epi16_mask _mm256_mask_cmp_epi8_mask _mm256_mask_cmp_epu16_mask _mm256_mask_cmp_epu8_mask _mm256_mask_cmpeq_epi16_mask _mm256_mask_cmpeq_epi8_mask _mm256_mask_cmpeq_epu16_mask _mm256_mask_cmpeq_epu8_mask _mm256_mask_cmpge_epi16_mask _mm256_mask_cmpge_epi8_mask _mm256_mask_cmpge_epu16_mask _mm256_mask_cmpge_epu8_mask _mm256_mask_cmpgt_epi16_mask _mm256_mask_cmpgt_epi8_mask _mm256_mask_cmpgt_epu16_mask _mm256_mask_cmpgt_epu8_mask _mm256_mask_cmple_epi16_mask _mm256_mask_cmple_epi8_mask _mm256_mask_cmple_epu16_mask _mm256_mask_cmple_epu8_mask _mm256_mask_cmplt_epi16_mask _mm256_mask_cmplt_epi8_mask _mm256_mask_cmplt_epu16_mask _mm256_mask_cmplt_epu8_mask _mm256_mask_cmpneq_epi16_mask _mm256_mask_cmpneq_epi8_mask _mm256_mask_cmpneq_epu16_mask _mm256_mask_cmpneq_epu8_mask _mm256_mask_cvtepi16_epi8 _mm256_mask_cvtepi16_storeu_epi8 _mm256_mask_cvtepi8_epi16 _mm256_mask_cvtepu8_epi16 _mm256_mask_cvtsepi16_epi8 _mm256_mask_cvtsepi16_storeu_epi8 _mm256_mask_cvtusepi16_epi8 _mm256_mask_cvtusepi16_storeu_epi8 _mm256_mask_dbsad_epu8 _mm256_mask_loadu_epi16 _mm256_mask_loadu_epi8 _mm256_mask_madd_epi16 _mm256_mask_maddubs_epi16 _mm256_mask_max_epi16 _mm256_mask_max_epi8 _mm256_mask_max_epu16 _mm256_mask_max_epu8 _mm256_mask_min_epi16 _mm256_mask_min_epi8 _mm256_mask_min_epu16 _mm256_mask_min_epu8 _mm256_mask_mov_epi16 _mm256_mask_mov_epi8 _mm256_mask_mulhi_epi16 _mm256_mask_mulhi_epu16 _mm256_mask_mulhrs_epi16 _mm256_mask_mullo_epi16 _mm256_mask_packs_epi16 _mm256_mask_packs_epi32 _mm256_mask_packus_epi16 _mm256_mask_packus_epi32 _mm256_mask_permutex2var_epi16 _mm256_mask_permutexvar_epi16 _mm256_mask_set1_epi16 _mm256_mask_set1_epi8 _mm256_mask_shuffle_epi8 _mm256_mask_shufflehi_epi16 _mm256_mask_shufflelo_epi16 _mm256_mask_sll_epi16 _mm256_mask_slli_epi16 _mm256_mask_sllv_epi16 _mm256_mask_sra_epi16 _mm256_mask_srai_epi16 _mm256_mask_srav_epi16 _mm256_mask_srl_epi16 _mm256_mask_srli_epi16 _mm256_mask_srlv_epi16 _mm256_mask_storeu_epi16 _mm256_mask_storeu_epi8 _mm256_mask_sub_epi16 _mm256_mask_sub_epi8 _mm256_mask_subs_epi16 _mm256_mask_subs_epi8 _mm256_mask_subs_epu16 _mm256_mask_subs_epu8 _mm256_mask_test_epi16_mask _mm256_mask_test_epi8_mask _mm256_mask_testn_epi16_mask _mm256_mask_testn_epi8_mask _mm256_mask_unpackhi_epi16 _mm256_mask_unpackhi_epi8 _mm256_mask_unpacklo_epi16 _mm256_mask_unpacklo_epi8 _mm256_maskz_abs_epi16 _mm256_maskz_abs_epi8 _mm256_maskz_add_epi16 _mm256_maskz_add_epi8 _mm256_maskz_adds_epi16 _mm256_maskz_adds_epi8 _mm256_maskz_adds_epu16 _mm256_maskz_adds_epu8 _mm256_maskz_alignr_epi8 _mm256_maskz_avg_epu16 _mm256_maskz_avg_epu8 _mm256_maskz_broadcastb_epi8 _mm256_maskz_broadcastw_epi16 _mm256_maskz_cvtepi16_epi8 _mm256_maskz_cvtepi8_epi16 _mm256_maskz_cvtepu8_epi16 _mm256_maskz_cvtsepi16_epi8 _mm256_maskz_cvtusepi16_epi8 _mm256_maskz_dbsad_epu8 _mm256_maskz_loadu_epi16 _mm256_maskz_loadu_epi8 _mm256_maskz_madd_epi16 _mm256_maskz_maddubs_epi16 _mm256_maskz_max_epi16 _mm256_maskz_max_epi8 _mm256_maskz_max_epu16 _mm256_maskz_max_epu8 _mm256_maskz_min_epi16 _mm256_maskz_min_epi8 _mm256_maskz_min_epu16 _mm256_maskz_min_epu8 _mm256_maskz_mov_epi16 _mm256_maskz_mov_epi8 _mm256_maskz_mulhi_epi16 _mm256_maskz_mulhi_epu16 _mm256_maskz_mulhrs_epi16 _mm256_maskz_mullo_epi16 _mm256_maskz_packs_epi16 _mm256_maskz_packs_epi32 _mm256_maskz_packus_epi16 _mm256_maskz_packus_epi32 _mm256_maskz_permutex2var_epi16 _mm256_maskz_permutexvar_epi16 _mm256_maskz_set1_epi16 _mm256_maskz_set1_epi8 _mm256_maskz_shuffle_epi8 _mm256_maskz_shufflehi_epi16 _mm256_maskz_shufflelo_epi16 _mm256_maskz_sll_epi16 _mm256_maskz_slli_epi16 _mm256_maskz_sllv_epi16 _mm256_maskz_sra_epi16 _mm256_maskz_srai_epi16 _mm256_maskz_srav_epi16 _mm256_maskz_srl_epi16 _mm256_maskz_srli_epi16 _mm256_maskz_srlv_epi16 _mm256_maskz_sub_epi16 _mm256_maskz_sub_epi8 _mm256_maskz_subs_epi16 _mm256_maskz_subs_epi8 _mm256_maskz_subs_epu16 _mm256_maskz_subs_epu8 _mm256_maskz_unpackhi_epi16 _mm256_maskz_unpackhi_epi8 _mm256_maskz_unpacklo_epi16 _mm256_maskz_unpacklo_epi8 _mm256_movepi16_mask _mm256_movepi8_mask _mm256_movm_epi16 _mm256_movm_epi8 _mm256_permutex2var_epi16 _mm256_permutexvar_epi16 _mm256_sllv_epi16 _mm256_srav_epi16 _mm256_srlv_epi16 _mm256_storeu_epi16 _mm256_storeu_epi8 _mm256_test_epi16_mask _mm256_test_epi8_mask _mm256_testn_epi16_mask _mm256_testn_epi8_mask _mm512_kunpackd _mm512_kunpackw _mm_cmp_epi16_mask _mm_cmp_epi8_mask _mm_cmp_epu16_mask _mm_cmp_epu8_mask _mm_cmpeq_epi16_mask _mm_cmpeq_epi8_mask _mm_cmpeq_epu16_mask _mm_cmpeq_epu8_mask _mm_cmpge_epi16_mask _mm_cmpge_epi8_mask _mm_cmpge_epu16_mask _mm_cmpge_epu8_mask _mm_cmpgt_epi16_mask _mm_cmpgt_epi8_mask _mm_cmpgt_epu16_mask _mm_cmpgt_epu8_mask _mm_cmple_epi16_mask _mm_cmple_epi8_mask _mm_cmple_epu16_mask _mm_cmple_epu8_mask _mm_cmplt_epi16_mask _mm_cmplt_epi8_mask _mm_cmplt_epu16_mask _mm_cmplt_epu8_mask _mm_cmpneq_epi16_mask _mm_cmpneq_epi8_mask _mm_cmpneq_epu16_mask _mm_cmpneq_epu8_mask _mm_cvtepi16_epi8 _mm_cvtsepi16_epi8 _mm_cvtusepi16_epi8 _mm_dbsad_epu8 _mm_loadu_epi16 _mm_loadu_epi8 _mm_mask2_permutex2var_epi16 _mm_mask_abs_epi16 _mm_mask_abs_epi8 _mm_mask_add_epi16 _mm_mask_add_epi8 _mm_mask_adds_epi16 _mm_mask_adds_epi8 _mm_mask_adds_epu16 _mm_mask_adds_epu8 _mm_mask_alignr_epi8 _mm_mask_avg_epu16 _mm_mask_avg_epu8 _mm_mask_blend_epi16 _mm_mask_blend_epi8 _mm_mask_broadcastb_epi8 _mm_mask_broadcastw_epi16 _mm_mask_cmp_epi16_mask _mm_mask_cmp_epi8_mask _mm_mask_cmp_epu16_mask _mm_mask_cmp_epu8_mask _mm_mask_cmpeq_epi16_mask _mm_mask_cmpeq_epi8_mask _mm_mask_cmpeq_epu16_mask _mm_mask_cmpeq_epu8_mask _mm_mask_cmpge_epi16_mask _mm_mask_cmpge_epi8_mask _mm_mask_cmpge_epu16_mask _mm_mask_cmpge_epu8_mask _mm_mask_cmpgt_epi16_mask _mm_mask_cmpgt_epi8_mask _mm_mask_cmpgt_epu16_mask _mm_mask_cmpgt_epu8_mask _mm_mask_cmple_epi16_mask _mm_mask_cmple_epi8_mask _mm_mask_cmple_epu16_mask _mm_mask_cmple_epu8_mask _mm_mask_cmplt_epi16_mask _mm_mask_cmplt_epi8_mask _mm_mask_cmplt_epu16_mask _mm_mask_cmplt_epu8_mask _mm_mask_cmpneq_epi16_mask _mm_mask_cmpneq_epi8_mask _mm_mask_cmpneq_epu16_mask _mm_mask_cmpneq_epu8_mask _mm_mask_cvtepi16_epi8 _mm_mask_cvtepi16_storeu_epi8 _mm_mask_cvtepi8_epi16 _mm_mask_cvtepu8_epi16 _mm_mask_cvtsepi16_epi8 _mm_mask_cvtsepi16_storeu_epi8 _mm_mask_cvtusepi16_epi8 _mm_mask_cvtusepi16_storeu_epi8 _mm_mask_dbsad_epu8 _mm_mask_loadu_epi16 _mm_mask_loadu_epi8 _mm_mask_madd_epi16 _mm_mask_maddubs_epi16 _mm_mask_max_epi16 _mm_mask_max_epi8 _mm_mask_max_epu16 _mm_mask_max_epu8 _mm_mask_min_epi16 _mm_mask_min_epi8 _mm_mask_min_epu16 _mm_mask_min_epu8 _mm_mask_mov_epi16 _mm_mask_mov_epi8 _mm_mask_mulhi_epi16 _mm_mask_mulhi_epu16 _mm_mask_mulhrs_epi16 _mm_mask_mullo_epi16 _mm_mask_packs_epi16 _mm_mask_packs_epi32 _mm_mask_packus_epi16 _mm_mask_packus_epi32 _mm_mask_permutex2var_epi16 _mm_mask_permutexvar_epi16 _mm_mask_set1_epi16 _mm_mask_set1_epi8 _mm_mask_shuffle_epi8 _mm_mask_shufflehi_epi16 _mm_mask_shufflelo_epi16 _mm_mask_sll_epi16 _mm_mask_slli_epi16 _mm_mask_sllv_epi16 _mm_mask_sra_epi16 _mm_mask_srai_epi16 _mm_mask_srav_epi16 _mm_mask_srl_epi16 _mm_mask_srli_epi16 _mm_mask_srlv_epi16 _mm_mask_storeu_epi16 _mm_mask_storeu_epi8 _mm_mask_sub_epi16 _mm_mask_sub_epi8 _mm_mask_subs_epi16 _mm_mask_subs_epi8 _mm_mask_subs_epu16 _mm_mask_subs_epu8 _mm_mask_test_epi16_mask _mm_mask_test_epi8_mask _mm_mask_testn_epi16_mask _mm_mask_testn_epi8_mask _mm_mask_unpackhi_epi16 _mm_mask_unpackhi_epi8 _mm_mask_unpacklo_epi16 _mm_mask_unpacklo_epi8 _mm_maskz_abs_epi16 _mm_maskz_abs_epi8 _mm_maskz_add_epi16 _mm_maskz_add_epi8 _mm_maskz_adds_epi16 _mm_maskz_adds_epi8 _mm_maskz_adds_epu16 _mm_maskz_adds_epu8 _mm_maskz_alignr_epi8 _mm_maskz_avg_epu16 _mm_maskz_avg_epu8 _mm_maskz_broadcastb_epi8 _mm_maskz_broadcastw_epi16 _mm_maskz_cvtepi16_epi8 _mm_maskz_cvtepi8_epi16 _mm_maskz_cvtepu8_epi16 _mm_maskz_cvtsepi16_epi8 _mm_maskz_cvtusepi16_epi8 _mm_maskz_dbsad_epu8 _mm_maskz_loadu_epi16 _mm_maskz_loadu_epi8 _mm_maskz_madd_epi16 _mm_maskz_maddubs_epi16 _mm_maskz_max_epi16 _mm_maskz_max_epi8 _mm_maskz_max_epu16 _mm_maskz_max_epu8 _mm_maskz_min_epi16 _mm_maskz_min_epi8 _mm_maskz_min_epu16 _mm_maskz_min_epu8 _mm_maskz_mov_epi16 _mm_maskz_mov_epi8 _mm_maskz_mulhi_epi16 _mm_maskz_mulhi_epu16 _mm_maskz_mulhrs_epi16 _mm_maskz_mullo_epi16 _mm_maskz_packs_epi16 _mm_maskz_packs_epi32 _mm_maskz_packus_epi16 _mm_maskz_packus_epi32 _mm_maskz_permutex2var_epi16 _mm_maskz_permutexvar_epi16 _mm_maskz_set1_epi16 _mm_maskz_set1_epi8 _mm_maskz_shuffle_epi8 _mm_maskz_shufflehi_epi16 _mm_maskz_shufflelo_epi16 _mm_maskz_sll_epi16 _mm_maskz_slli_epi16 _mm_maskz_sllv_epi16 _mm_maskz_sra_epi16 _mm_maskz_srai_epi16 _mm_maskz_srav_epi16 _mm_maskz_srl_epi16 _mm_maskz_srli_epi16 _mm_maskz_srlv_epi16 _mm_maskz_sub_epi16 _mm_maskz_sub_epi8 _mm_maskz_subs_epi16 _mm_maskz_subs_epi8 _mm_maskz_subs_epu16 _mm_maskz_subs_epu8 _mm_maskz_unpackhi_epi16 _mm_maskz_unpackhi_epi8 _mm_maskz_unpacklo_epi16 _mm_maskz_unpacklo_epi8 _mm_movepi16_mask _mm_movepi8_mask _mm_movm_epi16 _mm_movm_epi8 _mm_permutex2var_epi16 _mm_permutexvar_epi16 _mm_sllv_epi16 _mm_srav_epi16 _mm_srlv_epi16 _mm_storeu_epi16 _mm_storeu_epi8 _mm_test_epi16_mask _mm_test_epi8_mask _mm_testn_epi16_mask _mm_testn_epi8_mask _store_mask64 _store_mask64_kadd_mask32 ```
Not mentioned avx512f intrinsics ``` _cvtmask16_u32 _cvtu32_mask16 _kortest_mask16_u8 _kortestc_mask16_u8 _kortestz_mask16_u8 _kshiftli_mask16 _kshiftri_mask16 _load_mask16 _mm256_and_epi32 _mm256_and_epi64 _mm256_andnot_epi32 _mm256_andnot_epi64 _mm256_cvtepu32_ps _mm256_mask_cvtepu32_ps _mm256_mask_cvtps_pd _mm256_maskz_cvtepu32_ps _mm256_maskz_cvtps_pd _mm256_rsqrt14_pd _mm256_rsqrt14_ps _mm512_ceil_pd _mm512_ceil_ps _mm512_cvtsd_f64 _mm512_cvtss_f32 _mm512_floor_pd _mm512_floor_ps _mm512_mask_ceil_pd _mm512_mask_ceil_ps _mm512_mask_floor_pd _mm512_mask_floor_ps _mm_and_epi32 _mm_and_epi64 _mm_andnot_epi32 _mm_andnot_epi64 _mm_cvtepu32_ps _mm_mask_cvtepu32_ps _mm_mask_cvtps_pd _mm_maskz_cvtepu32_ps _mm_maskz_cvtps_pd _mm_rsqrt14_pd _mm_rsqrt14_ps _store_mask16_kand_mask16 ```

Not mentioned avx512bw intrinsics:

caelunshun commented 1 month ago

It looks like we're also missing _mm512_fpclass_ps_mask and mm512_fpclass_pd_mask, which are in the AVX-512DQ extension.

workingjubilee commented 1 month ago

The untracked features "avx512er" and "avx512pf" have been removed. You probably weren't using them. I'm only mentioning them here in case someone gets confused and wonders where they went and looks here. These were only implemented by Knight's Landing, so most AVX512-enabled CPUs didn't have them.

sayantn commented 2 weeks ago

We really need to upgrade the intrinsics list. Intel has since removed all the extgather, logather etc intrinsics (so avx512f.rs is almost complete now), and added the new AMX family, VEX variants of AVX512, and some more instruction sets.

IceTDrinker commented 2 weeks ago

Who is in "charge" of that question on the rust project side ? It seem a lot of people have changes to the intrinsics lists to contribute but it does not seem like it was updated recently ?

sayantn commented 2 weeks ago

I am working on a PR to update many aspects of stdarch, including the intrinsics list (rust-lang/stdarch#1594)

IceTDrinker commented 2 weeks ago

awesome 🙏

RalfJung commented 2 weeks ago

Generally this is libs team territory - or rather libs-api, I assume, since this is user-visible API. Sadly that team is particularly understaffed. The intrinsics are exposed via the stdarch module, for which @Amanieu seems to be the sole maintainer.

The usual process for API questions is to file an ACP but I do not know whether stdarch also uses that process.

Amanieu commented 1 week ago

We don't use ACPs for stdarch because we don't invent our own APIs and instead follow existing C APIs for arch-specific intrinsics.