Open m13253 opened 4 months ago
I am writing a machine learning software that needs to compute “Y = exp(a⋅X)”.
Sample code:
#include <cmath> #include <cstddef> void func(float a[]) { for(std::size_t i = 0; i != 16; i++) { a[i] = std::exp(a[i] * 2.0f); } }
Expected output:
push rbx mov rbx, rdi vmovups zmm0, ZMMWORD PTR [rdi] vaddps zmm0, zmm0, zmm0 call _ZGVeN16v_expf@PLT vmovups ZMMWORD PTR [rbx], zmm0 pop rbx vzeroupper ret
Actual output: Shuffles numbers between SIMD registers and GP registers multiple times, but never calls any vectorized math functions. (See https://godbolt.org/z/975T6xbss)
Clang version: 18.1.0
Compilation flags: clang++ -Ofast -fopenmp -fveclib=libmvec -mprefer-vector-width=512 -march=skylake-avx512
clang++ -Ofast -fopenmp -fveclib=libmvec -mprefer-vector-width=512 -march=skylake-avx512
Alternate 1: without * 2.0f.
* 2.0f
void func(float a[]) { for(std::size_t i = 0; i != 16; i++) { a[i] = std::exp(a[i]); } }
Output: Calls the AVX2 math function, instead of the AVX-512 one.
Alternate 2: separate * 2.0f and std::exp.
std::exp
void func(float a[]) { for(std::size_t i = 0; i != 16; i++) { a[i] *= 2.0f; } for(std::size_t i = 0; i != 16; i++) { a[i] = std::exp(a[i]); } }
Output: Fails to use any vectorized math functions.
@llvm/issue-subscribers-backend-x86
Author: Star Brilliant (m13253)
I am writing a machine learning software that needs to compute “Y = exp(a⋅X)”.
Sample code:
Expected output:
Actual output: Shuffles numbers between SIMD registers and GP registers multiple times, but never calls any vectorized math functions. (See https://godbolt.org/z/975T6xbss)
Clang version: 18.1.0
Compilation flags:
clang++ -Ofast -fopenmp -fveclib=libmvec -mprefer-vector-width=512 -march=skylake-avx512
Alternate 1: without
* 2.0f
.Output: Calls the AVX2 math function, instead of the AVX-512 one.
Alternate 2: separate
* 2.0f
andstd::exp
.Output: Fails to use any vectorized math functions.