google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.13k stars 315 forks source link

Added platform-specific I32/U32 SumOfMulQuadAccumulate on NEON_BF16 #2247

Closed johnplatts closed 3 months ago

johnplatts commented 3 months ago

Reimplemented I32/U32 SumOfMulQuadAccumulate on NEON_BF16 since the vdot[q]_s32 and vdot[q]_u32 intrinsics are guaranteed to be available on the NEON_BF16 target, which requires support for the AArch64 DotProd extension.