llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.57k stars 11.81k forks source link

[clang][OpenMP] omp scan code gen does not initialize allocas #87466

Open Dinistro opened 6 months ago

Dinistro commented 6 months ago

We stumbled over an issue in the code gen for omp scan. For a simple example like the following, the "inscan variable" is not properly initialized.

int curr_scan = 0;
#pragma omp parallel shared(thread_data, scan, curr_scan)
{
    int n = omp_get_num_threads();
    #pragma omp for reduction(inscan, + : curr_scan)
    for (int i = 0; i < n; ++i) {
        scan[i] = curr_scan;
        #pragma omp scan exclusive(curr_scan)
        curr_scan += thread_data[i];
    }
}

For a properly initialized thread_data, scan is filled up with random data. The generated IR shows that there are a few basic blocks in the outlined function that are unreachable, which seems suspicious.

Full example with code, IR, and a CFG visualization: https://godbolt.org/z/rna6K8T6Y

Dinistro commented 6 months ago

Pinging @alexey-bataev and @jdoerfert, as they were involved in the revision that originally introduced this feature: https://reviews.llvm.org/D79948#change-6nyECRH02yni

alexey-bataev commented 6 months ago

Looks like there is a compiler crash during codegen

Dinistro commented 6 months ago

Looks like there is a compiler crash during codegen

This is the case for clang17, but it passes for me with clang18. NVM, my local build also explodes now. I wonder what the difference to the compiler explorer version is.

llvmbot commented 6 months ago

@llvm/issue-subscribers-clang-codegen

Author: Christian Ulmann (Dinistro)

We stumbled over an issue in the code gen for `omp scan`. For a simple example like the following, the "inscan variable" is not properly initialized. ``` int curr_scan = 0; #pragma omp parallel shared(thread_data, scan, curr_scan) { int n = omp_get_num_threads(); #pragma omp for reduction(inscan, + : curr_scan) for (int i = 0; i < n; ++i) { scan[i] = curr_scan; #pragma omp scan exclusive(curr_scan) curr_scan += thread_data[i]; } } ``` For a properly initialized `thread_data`, `scan` is filled up with random data. The generated IR shows that there are a few basic blocks in the outlined function that are unreachable, which seems suspicious. Full example with code, IR, and a CFG visualization: https://godbolt.org/z/rna6K8T6Y
nikola-tesic-ns commented 1 month ago

IMO, the problem is in generation of declarations for OMP scan temporary variables. The temporary buffer declaration is tied to worksharing for region and it should be tied to omp parallel region. This makes the buffer thread private and it should be shared. I've tried to move those decls out of the parallel region, but wasn't successful, as OMP Codegen for scan temporary vars is tied to worksharing region. More details on discourse. @alexey-bataev do you have any idea, how hard is to fix this? Thanks!