#include <array>
int main() {
#pragma omp target teams distribute parallel for
for (int i = 0; i < 1; ++i) {
[](int) {
std::array<std::array<float, 1>, 1> transform = {};
transform[0][0] = 0;
for (int l = 0; l < 1; l++) {
}
}(i);
}
return 0;
}
> clang++ -std=c++17 -O2 -fopenmp=libomp --offload-arch=sm_60 --cuda-path=$CUDA_PATH this_file.cpp
fatal error: error in backend: Cannot select: 0xa26b5e0: i64,ch = stacksave 0xaa2e0a0
In function: __omp_offloading_26_35250d8_main_l3
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 18.0.0 (git@github.com:tom91136/llvm-project.git 96adadf8f7227f6543537056f27f98cb18bbe8ce)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/tom/software/llvm-ompt/96adadf8f722/bin
If we change std::array<std::array<float, 1>, 1> transform = {}; to std::array<std::array<float, 1>, 2> transform = {};, the issue goes away and the binary seems to be OK.
The empty for loop in the lambda and the placement of transform[0][0] = 0; plays a role here.
Possibly related to https://github.com/llvm/llvm-project/issues/55361.
```cpp
#include <array>
int main() {
#pragma omp target teams distribute parallel for
for (int i = 0; i < 1; ++i) {
[](int) {
std::array<std::array<float, 1>, 1> transform = {};
transform[0][0] = 0;
for (int l = 0; l < 1; l++) {
}
}(i);
}
return 0;
}
```
```shell
> clang++ -std=c++17 -O2 -fopenmp=libomp --offload-arch=sm_60 --cuda-path=$CUDA_PATH this_file.cpp
fatal error: error in backend: Cannot select: 0xa26b5e0: i64,ch = stacksave 0xaa2e0a0
In function: __omp_offloading_26_35250d8_main_l3
clang++: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 18.0.0 (git@github.com:tom91136/llvm-project.git 96adadf8f7227f6543537056f27f98cb18bbe8ce)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/tom/software/llvm-ompt/96adadf8f722/bin
```
If we change `std::array<std::array<float, 1>, 1> transform = {};` to `std::array<std::array<float, 1>, 2> transform = {};`, the issue goes away and the binary seems to be OK.
The empty for loop in the lambda and the placement of `transform[0][0] = 0;` plays a role here.
This is reduced from https://github.com/UoB-HPC/miniBUDE/blob/v2/src/std-indices/fasten.hpp, and using a different constant (>=8 in our case) on the transform array doesn't trigger the problem.
Issue is also reproducible on CUDA 11.8 targeting sm_35.
Only NVPTX is affected; targeting gfx906 works as expected.
Diagnostics: [diag.zip](https://github.com/llvm/llvm-project/files/12777430/diag.zip)
Possibly related to https://github.com/llvm/llvm-project/issues/55361.
If we change
std::array<std::array<float, 1>, 1> transform = {};
tostd::array<std::array<float, 1>, 2> transform = {};
, the issue goes away and the binary seems to be OK. The empty for loop in the lambda and the placement oftransform[0][0] = 0;
plays a role here.This is reduced from https://github.com/UoB-HPC/miniBUDE/blob/v2/src/std-indices/fasten.hpp, and using a different constant (>=8 in our case) on the transform array doesn't trigger the problem.
Issue is also reproducible on CUDA 11.8 targeting sm_35. Only NVPTX is affected; targeting gfx906 works as expected.
Diagnostics: diag.zip