llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.28k stars 11.68k forks source link

Math function vectorization failure with AVX-512 #94419

Open m13253 opened 4 months ago

m13253 commented 4 months ago

I am writing a machine learning software that needs to compute “Y = exp(a⋅X)”.

Sample code:

#include <cmath>
#include <cstddef>

void func(float a[]) {
    for(std::size_t i = 0; i != 16; i++) {
        a[i] = std::exp(a[i] * 2.0f);
    }
}

Expected output:

push    rbx
mov     rbx, rdi
vmovups zmm0, ZMMWORD PTR [rdi]
vaddps  zmm0, zmm0, zmm0
call    _ZGVeN16v_expf@PLT
vmovups ZMMWORD PTR [rbx], zmm0
pop     rbx
vzeroupper
ret

Actual output: Shuffles numbers between SIMD registers and GP registers multiple times, but never calls any vectorized math functions. (See https://godbolt.org/z/975T6xbss)

Clang version: 18.1.0

Compilation flags: clang++ -Ofast -fopenmp -fveclib=libmvec -mprefer-vector-width=512 -march=skylake-avx512


Alternate 1: without * 2.0f.

void func(float a[]) {
    for(std::size_t i = 0; i != 16; i++) {
        a[i] = std::exp(a[i]);
    }
}

Output: Calls the AVX2 math function, instead of the AVX-512 one.

Alternate 2: separate * 2.0f and std::exp.

void func(float a[]) {
    for(std::size_t i = 0; i != 16; i++) {
        a[i] *= 2.0f;
    }
    for(std::size_t i = 0; i != 16; i++) {
        a[i] = std::exp(a[i]);
    }
}

Output: Fails to use any vectorized math functions.

llvmbot commented 4 months ago

@llvm/issue-subscribers-backend-x86

Author: Star Brilliant (m13253)

I am writing a machine learning software that needs to compute “Y = exp(a⋅X)”. Sample code: ```c++ #include <cmath> #include <cstddef> void func(float a[]) { for(std::size_t i = 0; i != 16; i++) { a[i] = std::exp(a[i] * 2.0f); } } ``` Expected output: ```asm push rbx mov rbx, rdi vmovups zmm0, ZMMWORD PTR [rdi] vaddps zmm0, zmm0, zmm0 call _ZGVeN16v_expf@PLT vmovups ZMMWORD PTR [rbx], zmm0 pop rbx vzeroupper ret ``` Actual output: Shuffles numbers between SIMD registers and GP registers multiple times, but never calls any vectorized math functions. (See <https://godbolt.org/z/975T6xbss>) Clang version: 18.1.0 Compilation flags: `clang++ -Ofast -fopenmp -fveclib=libmvec -mprefer-vector-width=512 -march=skylake-avx512` --- **Alternate 1:** without `* 2.0f`. ```c++ void func(float a[]) { for(std::size_t i = 0; i != 16; i++) { a[i] = std::exp(a[i]); } } ``` Output: Calls the AVX2 math function, instead of the AVX-512 one. **Alternate 2:** separate `* 2.0f` and `std::exp`. ```c++ void func(float a[]) { for(std::size_t i = 0; i != 16; i++) { a[i] *= 2.0f; } for(std::size_t i = 0; i != 16; i++) { a[i] = std::exp(a[i]); } } ``` Output: Fails to use any vectorized math functions.