llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.6k stars 11.82k forks source link

Incorrect codegen for #pragma omp simd loop at -O1 #47969

Open b1e213d1-25a4-466d-a8fa-acab6241b133 opened 3 years ago

b1e213d1-25a4-466d-a8fa-acab6241b133 commented 3 years ago
Bugzilla Link 48625
Version trunk
OS MacOS X
CC @everton1984,@zygoloid

Extended Description

The following program leads to incorrect codegen on x86-64 when compiled with '-fopenmp-simd -O1': only every 4th (SSE) or 8th (AVX) element is summed for large n. Changing the optimization level or enabling either FIX1 or FIX2 in the code avoids the problem. The problem occurs in both C and C++ mode and seemingly for any x86-64 uarch. Example of incorrect ASM: https://godbolt.org/z/17n8bP.

#ifdef __cplusplus
#define restrict __restrict
#endif

#define FIX1 0
#define FIX2 0

void sum(int n, const float* A,
#if FIX1
    float* restrict value)
#else
    float* value_)
#endif
{
#if !FIX1 && !FIX2
    float* restrict value = (float* restrict)value_;
#endif

#if FIX2
    float tmp = *value_;
    float* value = &tmp;
#endif

    #pragma omp simd
    for (int i = 0;i < n;i++) *value += A[i];

#if FIX2
    *value_ = tmp;
#endif
}
b1e213d1-25a4-466d-a8fa-acab6241b133 commented 3 years ago

The local variable 'value' is restrict-qualified, and hence should be assumed not to alias, right?

What if instead of a separate variable, the result should be stored in A[n]? Then one couldn't use restrict in the function signature, and a restrict-qualified local variable would have to be used. I just checked this and the same bug exists in that case.

Yes, a temporary value fixes the bug, but AFAIK the original code should be valid C99.

everton1984 commented 3 years ago

I believe this is not a bug as both value_ and A can be an alias to the same memory location. You need to either use 'restrict' or a temporary variable to make sure that doesn't happen.