For example: `__m128i _mm_dpbusd_avx_epi32 (__m128i src, __m128i a, __m128i b)`
This takes 1 x <4 x i32> "src" and 2 x <16 x i8> "a * b" multiplication inputs but the clang/llvm intrinsics are defined as:
```
TARGET_BUILTIN(__builtin_ia32_vpdpbusd128, "V4iV4iV4iV4i", "ncV:128:", "avx512vl,avx512vnni|avxvnni")
def int_x86_avx512_vpdpbusd_128 :
ClangBuiltin<"__builtin_ia32_vpdpbusd128">,
DefaultAttrsIntrinsic<[llvm_v4i32_ty], [llvm_v4i32_ty, llvm_v4i32_ty,
llvm_v4i32_ty], [IntrNoMem]>;
```
which means we require hardcoded mappings of the src/dst types for any combines that involve them.
For example:
__m128i _mm_dpbusd_avx_epi32 (__m128i src, __m128i a, __m128i b)
This takes 1 x <4 x i32> "src" and 2 x <16 x i8> "a * b" multiplication inputs but the clang/llvm intrinsics are defined as:
which means we require hardcoded mappings of the src/dst types for any combines that involve them.