llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.78k stars 11.9k forks source link

[Clang] Add __builtin_selectvector and use it for AVX512 intrinsics #91306

Open philnik777 opened 5 months ago

philnik777 commented 5 months ago

This is also very useful for generic code. For example this would allow libc++ to vectorize {min,max,minmax}_element without having to use platform-specific intrinsics. I've done some testing and even at -O0 Clang compiles the code to the expected instructions for architectures where the mask vector has the same bit count as the element vector (i.e. every SIMD ISA except AVX512 that I'm aware of) as long as the comparison operation is visible.

github-actions[bot] commented 5 months ago

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

llvmbot commented 5 months ago

@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang-codegen

Author: Nikolas Klauser (philnik777)

Changes This is also very useful for generic code. For example this would allow libc++ to vectorize `{min,max,minmax}_element` without having to use platform-specific intrinsics. I've done some testing and even at `-O0` Clang compiles the code to the expected instructions for architectures where the mask vector has the same bit count as the element vector (i.e. every SIMD ISA except AVX512 that I'm aware of) as long as the comparison operation is visible. --- Patch is 962.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91306.diff 32 Files Affected: - (modified) clang/docs/LanguageExtensions.rst (+20) - (modified) clang/include/clang/Basic/Builtins.td (+6) - (modified) clang/include/clang/Basic/BuiltinsX86.def (-24) - (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+3-1) - (modified) clang/lib/CodeGen/CGBuiltin.cpp (+6-25) - (modified) clang/lib/Headers/avx512bf16intrin.h (+12-12) - (modified) clang/lib/Headers/avx512bitalgintrin.h (+6-6) - (modified) clang/lib/Headers/avx512bwintrin.h (+386-396) - (modified) clang/lib/Headers/avx512cdintrin.h (+24-24) - (modified) clang/lib/Headers/avx512dqintrin.h (+134-134) - (modified) clang/lib/Headers/avx512fintrin.h (+958-957) - (modified) clang/lib/Headers/avx512fp16intrin.h (+90-85) - (modified) clang/lib/Headers/avx512ifmaintrin.h (+12-12) - (modified) clang/lib/Headers/avx512ifmavlintrin.h (+24-25) - (modified) clang/lib/Headers/avx512vbmi2intrin.h (+84-86) - (modified) clang/lib/Headers/avx512vbmiintrin.h (+21-22) - (modified) clang/lib/Headers/avx512vbmivlintrin.h (+42-43) - (modified) clang/lib/Headers/avx512vlbf16intrin.h (+24-24) - (modified) clang/lib/Headers/avx512vlbitalgintrin.h (+12-12) - (modified) clang/lib/Headers/avx512vlbwintrin.h (+776-788) - (modified) clang/lib/Headers/avx512vlcdintrin.h (+48-48) - (modified) clang/lib/Headers/avx512vldqintrin.h (+190-190) - (modified) clang/lib/Headers/avx512vlfp16intrin.h (+212-223) - (modified) clang/lib/Headers/avx512vlintrin.h (+1798-1982) - (modified) clang/lib/Headers/avx512vlvbmi2intrin.h (+168-168) - (modified) clang/lib/Headers/avx512vlvnniintrin.h (+48-48) - (modified) clang/lib/Headers/avx512vnniintrin.h (+24-24) - (modified) clang/lib/Headers/avx512vpopcntdqintrin.h (+6-4) - (modified) clang/lib/Headers/avx512vpopcntdqvlintrin.h (+12-8) - (modified) clang/lib/Headers/gfniintrin.h (+33-34) - (modified) clang/lib/Sema/SemaChecking.cpp (+56) - (added) clang/test/Sema/builtin-selectvector.c (+18) ``````````diff diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index 96691b45d63a3..6513676438ffb 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -3019,6 +3019,26 @@ C-style cast applied to each element of the first argument. Query for this feature with ``__has_builtin(__builtin_convertvector)``. +``__builtin_selectvector`` +-------------------------- + +``__builtin_selectvector`` is used to express generic vector element selection. + +**Signature**: + +.. code-block:: c++ + + template + simd_vec __builtin_selectvector(simd_vec lhs, simd_vec rhs, + simd_vec cond) + +**Description**: + +The returned vector is equivalent to +``simd_vec{cond[0] ? rhs[0] : lhs[0], ..., cond[N - 1] ? rhs[N - 1] : lhs[N - 1]}``. + +Query for this feature with ``__has_builtin(__builtin_selectvector)``. + ``__builtin_bitreverse`` ------------------------ diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td index d6ceb450bd106..279330d9b5251 100644 --- a/clang/include/clang/Basic/Builtins.td +++ b/clang/include/clang/Basic/Builtins.td @@ -1176,6 +1176,12 @@ def ConvertVector : Builtin { let Prototype = "void(...)"; } +def SelectVector : Builtin { + let Spellings = ["__builtin_selectvector"]; + let Attributes = [NoThrow, Const, CustomTypeChecking]; + let Prototype = "void(...)"; +} + def AllocaUninitialized : Builtin { let Spellings = ["__builtin_alloca_uninitialized"]; let Attributes = [FunctionWithBuiltinPrefix, NoThrow]; diff --git a/clang/include/clang/Basic/BuiltinsX86.def b/clang/include/clang/Basic/BuiltinsX86.def index eafcc219c1096..2e099b3ab4f05 100644 --- a/clang/include/clang/Basic/BuiltinsX86.def +++ b/clang/include/clang/Basic/BuiltinsX86.def @@ -1973,30 +1973,6 @@ TARGET_BUILTIN(__builtin_ia32_vfcmulcph256_mask, "V8fV8fV8fV8fUc", "ncV:256:", TARGET_BUILTIN(__builtin_ia32_vfcmulcph512_mask, "V16fV16fV16fV16fUsIi", "ncV:512:", "avx512fp16,evex512") // generic select intrinsics -TARGET_BUILTIN(__builtin_ia32_selectb_128, "V16cUsV16cV16c", "ncV:128:", "avx512bw,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectb_256, "V32cUiV32cV32c", "ncV:256:", "avx512bw,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectb_512, "V64cUOiV64cV64c", "ncV:512:", "avx512bw,evex512") -TARGET_BUILTIN(__builtin_ia32_selectw_128, "V8sUcV8sV8s", "ncV:128:", "avx512bw,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectw_256, "V16sUsV16sV16s", "ncV:256:", "avx512bw,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectw_512, "V32sUiV32sV32s", "ncV:512:", "avx512bw,evex512") -TARGET_BUILTIN(__builtin_ia32_selectd_128, "V4iUcV4iV4i", "ncV:128:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectd_256, "V8iUcV8iV8i", "ncV:256:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectd_512, "V16iUsV16iV16i", "ncV:512:", "avx512f,evex512") -TARGET_BUILTIN(__builtin_ia32_selectph_128, "V8xUcV8xV8x", "ncV:128:", "avx512fp16,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectph_256, "V16xUsV16xV16x", "ncV:256:", "avx512fp16,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectph_512, "V32xUiV32xV32x", "ncV:512:", "avx512fp16,evex512") -TARGET_BUILTIN(__builtin_ia32_selectpbf_128, "V8yUcV8yV8y", "ncV:128:", "avx512bf16,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectpbf_256, "V16yUsV16yV16y", "ncV:256:", "avx512bf16,avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectpbf_512, "V32yUiV32yV32y", "ncV:512:", "avx512bf16,evex512") -TARGET_BUILTIN(__builtin_ia32_selectq_128, "V2OiUcV2OiV2Oi", "ncV:128:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectq_256, "V4OiUcV4OiV4Oi", "ncV:256:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectq_512, "V8OiUcV8OiV8Oi", "ncV:512:", "avx512f,evex512") -TARGET_BUILTIN(__builtin_ia32_selectps_128, "V4fUcV4fV4f", "ncV:128:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectps_256, "V8fUcV8fV8f", "ncV:256:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectps_512, "V16fUsV16fV16f", "ncV:512:", "avx512f,evex512") -TARGET_BUILTIN(__builtin_ia32_selectpd_128, "V2dUcV2dV2d", "ncV:128:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectpd_256, "V4dUcV4dV4d", "ncV:256:", "avx512vl") -TARGET_BUILTIN(__builtin_ia32_selectpd_512, "V8dUcV8dV8d", "ncV:512:", "avx512f,evex512") TARGET_BUILTIN(__builtin_ia32_selectsh_128, "V8xUcV8xV8x", "ncV:128:", "avx512fp16") TARGET_BUILTIN(__builtin_ia32_selectsbf_128, "V8yUcV8yV8y", "ncV:128:", "avx512bf16") TARGET_BUILTIN(__builtin_ia32_selectss_128, "V4fUcV4fV4f", "ncV:128:", "avx512f") diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 774d2b53a3825..7c2222fe51203 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -12035,7 +12035,9 @@ def err_builtin_invalid_arg_type: Error < "a floating point type|" "a vector of integers|" "an unsigned integer|" - "an 'int'}1 (was %2)">; + "an 'int'|" + "a vector of bools" + "}1 (was %2)">; def err_builtin_matrix_disabled: Error< "matrix types extension is disabled. Pass -fenable-matrix to enable it">; diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index c7b219dcfcec5..487f9a2099eb9 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -3744,6 +3744,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, return RValue::get(Result); } + case Builtin::BI__builtin_selectvector: { + return RValue::get(Builder.CreateSelect(EmitScalarExpr(E->getArg(2)), + EmitScalarExpr(E->getArg(0)), + EmitScalarExpr(E->getArg(1)))); + } + case Builtin::BI__builtin_elementwise_abs: { Value *Result; QualType QT = E->getArg(0)->getType(); @@ -15513,31 +15519,6 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID, case X86::BI__builtin_ia32_prorvq256: case X86::BI__builtin_ia32_prorvq512: return EmitX86FunnelShift(*this, Ops[0], Ops[0], Ops[1], true); - case X86::BI__builtin_ia32_selectb_128: - case X86::BI__builtin_ia32_selectb_256: - case X86::BI__builtin_ia32_selectb_512: - case X86::BI__builtin_ia32_selectw_128: - case X86::BI__builtin_ia32_selectw_256: - case X86::BI__builtin_ia32_selectw_512: - case X86::BI__builtin_ia32_selectd_128: - case X86::BI__builtin_ia32_selectd_256: - case X86::BI__builtin_ia32_selectd_512: - case X86::BI__builtin_ia32_selectq_128: - case X86::BI__builtin_ia32_selectq_256: - case X86::BI__builtin_ia32_selectq_512: - case X86::BI__builtin_ia32_selectph_128: - case X86::BI__builtin_ia32_selectph_256: - case X86::BI__builtin_ia32_selectph_512: - case X86::BI__builtin_ia32_selectpbf_128: - case X86::BI__builtin_ia32_selectpbf_256: - case X86::BI__builtin_ia32_selectpbf_512: - case X86::BI__builtin_ia32_selectps_128: - case X86::BI__builtin_ia32_selectps_256: - case X86::BI__builtin_ia32_selectps_512: - case X86::BI__builtin_ia32_selectpd_128: - case X86::BI__builtin_ia32_selectpd_256: - case X86::BI__builtin_ia32_selectpd_512: - return EmitX86Select(*this, Ops[0], Ops[1], Ops[2]); case X86::BI__builtin_ia32_selectsh_128: case X86::BI__builtin_ia32_selectsbf_128: case X86::BI__builtin_ia32_selectss_128: diff --git a/clang/lib/Headers/avx512bf16intrin.h b/clang/lib/Headers/avx512bf16intrin.h index b28d2e243f2cb..1c32831a8cc57 100644 --- a/clang/lib/Headers/avx512bf16intrin.h +++ b/clang/lib/Headers/avx512bf16intrin.h @@ -77,9 +77,9 @@ _mm512_cvtne2ps_pbh(__m512 __A, __m512 __B) { /// conversion of __B, and higher 256 bits come from conversion of __A. static __inline__ __m512bh __DEFAULT_FN_ATTRS512 _mm512_mask_cvtne2ps_pbh(__m512bh __W, __mmask32 __U, __m512 __A, __m512 __B) { - return (__m512bh)__builtin_ia32_selectpbf_512((__mmask32)__U, - (__v32bf)_mm512_cvtne2ps_pbh(__A, __B), - (__v32bf)__W); + return (__m512bh)__builtin_selectvector( + (__v32bf)_mm512_cvtne2ps_pbh(__A, __B), (__v32bf)__W, + __builtin_bit_cast(__vecmask32, __U)); } /// Convert Two Packed Single Data to One Packed BF16 Data. @@ -99,9 +99,9 @@ _mm512_mask_cvtne2ps_pbh(__m512bh __W, __mmask32 __U, __m512 __A, __m512 __B) { /// conversion of __B, and higher 256 bits come from conversion of __A. static __inline__ __m512bh __DEFAULT_FN_ATTRS512 _mm512_maskz_cvtne2ps_pbh(__mmask32 __U, __m512 __A, __m512 __B) { - return (__m512bh)__builtin_ia32_selectpbf_512((__mmask32)__U, - (__v32bf)_mm512_cvtne2ps_pbh(__A, __B), - (__v32bf)_mm512_setzero_si512()); + return (__m512bh)__builtin_selectvector( + (__v32bf)_mm512_cvtne2ps_pbh(__A, __B), (__v32bf)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask32, __U)); } /// Convert Packed Single Data to Packed BF16 Data. @@ -200,9 +200,9 @@ _mm512_dpbf16_ps(__m512 __D, __m512bh __A, __m512bh __B) { /// __A, __B and __D static __inline__ __m512 __DEFAULT_FN_ATTRS512 _mm512_mask_dpbf16_ps(__m512 __D, __mmask16 __U, __m512bh __A, __m512bh __B) { - return (__m512)__builtin_ia32_selectps_512((__mmask16)__U, - (__v16sf)_mm512_dpbf16_ps(__D, __A, __B), - (__v16sf)__D); + return (__m512)__builtin_selectvector( + (__v16sf)_mm512_dpbf16_ps(__D, __A, __B), (__v16sf)__D, + __builtin_bit_cast(__vecmask16, __U)); } /// Dot Product of BF16 Pairs Accumulated into Packed Single Precision. @@ -224,9 +224,9 @@ _mm512_mask_dpbf16_ps(__m512 __D, __mmask16 __U, __m512bh __A, __m512bh __B) { /// __A, __B and __D static __inline__ __m512 __DEFAULT_FN_ATTRS512 _mm512_maskz_dpbf16_ps(__mmask16 __U, __m512 __D, __m512bh __A, __m512bh __B) { - return (__m512)__builtin_ia32_selectps_512((__mmask16)__U, - (__v16sf)_mm512_dpbf16_ps(__D, __A, __B), - (__v16sf)_mm512_setzero_si512()); + return (__m512)__builtin_selectvector( + (__v16sf)_mm512_dpbf16_ps(__D, __A, __B), (__v16sf)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask16, __U)); } /// Convert Packed BF16 Data to Packed float Data. diff --git a/clang/lib/Headers/avx512bitalgintrin.h b/clang/lib/Headers/avx512bitalgintrin.h index bad265ceb7db2..f4e31c287af18 100644 --- a/clang/lib/Headers/avx512bitalgintrin.h +++ b/clang/lib/Headers/avx512bitalgintrin.h @@ -29,9 +29,9 @@ _mm512_popcnt_epi16(__m512i __A) static __inline__ __m512i __DEFAULT_FN_ATTRS _mm512_mask_popcnt_epi16(__m512i __A, __mmask32 __U, __m512i __B) { - return (__m512i) __builtin_ia32_selectw_512((__mmask32) __U, - (__v32hi) _mm512_popcnt_epi16(__B), - (__v32hi) __A); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_popcnt_epi16(__B), + (__v32hi)__A, + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS @@ -51,9 +51,9 @@ _mm512_popcnt_epi8(__m512i __A) static __inline__ __m512i __DEFAULT_FN_ATTRS _mm512_mask_popcnt_epi8(__m512i __A, __mmask64 __U, __m512i __B) { - return (__m512i) __builtin_ia32_selectb_512((__mmask64) __U, - (__v64qi) _mm512_popcnt_epi8(__B), - (__v64qi) __A); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_popcnt_epi8(__B), + (__v64qi)__A, + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS diff --git a/clang/lib/Headers/avx512bwintrin.h b/clang/lib/Headers/avx512bwintrin.h index c854720de6a65..ba77f979da1f8 100644 --- a/clang/lib/Headers/avx512bwintrin.h +++ b/clang/lib/Headers/avx512bwintrin.h @@ -369,16 +369,16 @@ _mm512_add_epi8 (__m512i __A, __m512i __B) { static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_add_epi8(__m512i __W, __mmask64 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U, - (__v64qi)_mm512_add_epi8(__A, __B), - (__v64qi)__W); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_add_epi8(__A, __B), + (__v64qi)__W, + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_maskz_add_epi8(__mmask64 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U, - (__v64qi)_mm512_add_epi8(__A, __B), - (__v64qi)_mm512_setzero_si512()); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_add_epi8(__A, __B), + (__v64qi)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 @@ -388,16 +388,16 @@ _mm512_sub_epi8 (__m512i __A, __m512i __B) { static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_sub_epi8(__m512i __W, __mmask64 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U, - (__v64qi)_mm512_sub_epi8(__A, __B), - (__v64qi)__W); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_sub_epi8(__A, __B), + (__v64qi)__W, + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_maskz_sub_epi8(__mmask64 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U, - (__v64qi)_mm512_sub_epi8(__A, __B), - (__v64qi)_mm512_setzero_si512()); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_sub_epi8(__A, __B), + (__v64qi)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 @@ -407,16 +407,16 @@ _mm512_add_epi16 (__m512i __A, __m512i __B) { static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_add_epi16(__m512i __W, __mmask32 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U, - (__v32hi)_mm512_add_epi16(__A, __B), - (__v32hi)__W); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_add_epi16(__A, __B), + (__v32hi)__W, + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_maskz_add_epi16(__mmask32 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U, - (__v32hi)_mm512_add_epi16(__A, __B), - (__v32hi)_mm512_setzero_si512()); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_add_epi16(__A, __B), + (__v32hi)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 @@ -426,16 +426,16 @@ _mm512_sub_epi16 (__m512i __A, __m512i __B) { static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_sub_epi16(__m512i __W, __mmask32 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U, - (__v32hi)_mm512_sub_epi16(__A, __B), - (__v32hi)__W); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_sub_epi16(__A, __B), + (__v32hi)__W, + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_maskz_sub_epi16(__mmask32 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U, - (__v32hi)_mm512_sub_epi16(__A, __B), - (__v32hi)_mm512_setzero_si512()); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_sub_epi16(__A, __B), + (__v32hi)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 @@ -445,32 +445,30 @@ _mm512_mullo_epi16 (__m512i __A, __m512i __B) { static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_mullo_epi16(__m512i __W, __mmask32 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U, - (__v32hi)_mm512_mullo_epi16(__A, __B), - (__v32hi)__W); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_mullo_epi16(__A, __B), + (__v32hi)__W, + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_maskz_mullo_epi16(__mmask32 __U, __m512i __A, __m512i __B) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U, - (__v32hi)_mm512_mullo_epi16(__A, __B), - (__v32hi)_mm512_setzero_si512()); + return (__m512i)__builtin_selectvector((__v32hi)_mm512_mullo_epi16(__A, __B), + (__v32hi)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_blend_epi8 (__mmask64 __U, __m512i __A, __m512i __W) { - return (__m512i) __builtin_ia32_selectb_512 ((__mmask64) __U, - (__v64qi) __W, - (__v64qi) __A); + return (__m512i)__builtin_selectvector((__v64qi)__W, (__v64qi)__A, + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_blend_epi16 (__mmask32 __U, __m512i __A, __m512i __W) { - return (__m512i) __builtin_ia32_selectw_512 ((__mmask32) __U, - (__v32hi) __W, - (__v32hi) __A); + return (__m512i)__builtin_selectvector((__v32hi)__W, (__v32hi)__A, + __builtin_bit_cast(__vecmask32, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 @@ -482,17 +480,17 @@ _mm512_abs_epi8 (__m512i __A) static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_abs_epi8 (__m512i __W, __mmask64 __U, __m512i __A) { - return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U, - (__v64qi)_mm512_abs_epi8(__A), - (__v64qi)__W); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_abs_epi8(__A), + (__v64qi)__W, + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_maskz_abs_epi8 (__mmask64 __U, __m512i __A) { - return (__m512i)__builtin_ia32_selectb_512((__mmask64)__U, - (__v64qi)_mm512_abs_epi8(__A), - (__v64qi)_mm512_setzero_si512()); + return (__m512i)__builtin_selectvector((__v64qi)_mm512_abs_epi8(__A), + (__v64qi)_mm512_setzero_si512(), + __builtin_bit_cast(__vecmask64, __U)); } static __inline__ __m512i __DEFAULT_FN_ATTRS512 @@ -504,17 +502,17 @@ _mm512_abs_epi16 (__m512i __A) static __inline__ __m512i __DEFAULT_FN_ATTRS512 _mm512_mask_abs_epi16 (__m512i __W, __mmask32 __U, __m512i __A) { - return (__m512i)__builtin_ia32_selectw_512((__mmask32)__U,... [truncated] ``````````
philnik777 commented 5 months ago

I don't understand why the order of emitted instructions changes based on how exactly Clang is compiled, but other than that this should be ready. Hopefully someone spots what the problem could be.

efriedma-quic commented 5 months ago

clang already supports ?: with a vector condition; does this add anything new on top of that?

philnik777 commented 1 month ago

clang already supports ?: with a vector condition; does this add anything new on top of that?

This works with bool vectors. I didn't realize you could use the ternary operator, since I only tested with them. I guess we could extend the ternary operator to accept bool vectors as well. Any thoughts?

efriedma-quic commented 1 month ago

You mean, if all three operands are boolean vectors? I'm surprised that doesn't already work.

philnik777 commented 1 month ago

You mean, if all three operands are boolean vectors? I'm surprised that doesn't already work.

No, I mean I have a vector of bools and want to select a value based on that. e.g. declval<simd_vector<bool, 16>>() ? declval<simd_vector<int, 16>>() : declval<simd_vector<int, 16>>().

efriedma-quic commented 1 month ago

The relevant bit of code is:

  // The OpenCL operator with a vector condition is sufficiently
  // different to merit its own checker.
  if ((getLangOpts().OpenCL && Cond.get()->getType()->isVectorType()) ||
      Cond.get()->getType()->isExtVectorType())
    return OpenCLCheckVectorConditional(*this, Cond, LHS, RHS, QuestionLoc);

Maybe makes sense to relax it.