Open m13253 opened 5 months ago
@llvm/issue-subscribers-backend-x86
Author: Star Brilliant (m13253)
There isn't a 16 wide exp(float)
in libmvec (or atleast LLVM isn't aware of it) which is why your vectorized loop gets expanded into straight-line scalar code. If you remove the "-mprefer-vector-width=512" you'll get two calls to 8 wide exp(float)
.
There isn't a 16 wide
exp(float)
in libmvec (or atleast LLVM isn't aware of it) which is why your vectorized loop gets expanded into straight-line scalar code. If you remove the "-mprefer-vector-width=512" you'll get two calls to 8 wideexp(float)
.
In my glibc 2.40+r16+gaa533d58ff-2, there is a 16 wide exp(float)
.
$ objdump -T /usr/lib/libmvec.so.1 | grep '_ZGV.*expf\?$'
0000000000008560 g DF .text 000000000000003d GLIBC_2.22 _ZGVcN8v_expf
00000000000062a0 g iD .text 0000000000000025 GLIBC_2.22 _ZGVbN2v_exp
0000000000007a70 g iD .text 0000000000000025 GLIBC_2.22 _ZGVbN4v_expf
0000000000006d90 g DF .text 000000000000003d GLIBC_2.22 _ZGVcN4v_exp
0000000000007f80 g iD .text 000000000000002e GLIBC_2.22 _ZGVdN8v_expf
0000000000008d00 g iD .text 0000000000000049 GLIBC_2.22 _ZGVeN16v_expf (← This one)
00000000000074c0 g iD .text 0000000000000049 GLIBC_2.22 _ZGVeN8v_exp
00000000000067b0 g iD .text 000000000000002e GLIBC_2.22 _ZGVdN4v_exp
Alternatively I tried:
#include <cstddef>
extern "C" {
#pragma omp declare simd simdlen(16)
float expf(float x);
}
void func(float a[]) {
#pragma omp simd
for(std::size_t i = 0; i != 16; i++) {
a[i] = expf(a[i] * 2.0f);
}
}
The compiler doesn’t generate SIMD calls at all from this version of code.
Do you have any more ideas to solve this issue?
I am writing a machine learning software that needs to compute “Y = exp(a⋅X)”.
Sample code:
Expected output:
Actual output: Shuffles numbers between SIMD registers and GP registers multiple times, but never calls any vectorized math functions. (See https://godbolt.org/z/975T6xbss)
Clang version: 18.1.0
Compilation flags:
clang++ -Ofast -fopenmp -fveclib=libmvec -mprefer-vector-width=512 -march=skylake-avx512
Alternate 1: without
* 2.0f
.Output: Calls the AVX2 math function, instead of the AVX-512 one.
Alternate 2: separate
* 2.0f
andstd::exp
.Output: Fails to use any vectorized math functions.