llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.82k stars 10.99k forks source link

[OpenMP] Linking OpenMP target offloading fails with optimization enabled #84028

Open jprotze opened 4 months ago

jprotze commented 4 months ago

Building an OpenMP hello world fails to link, when Optimization is turned on:

#include <omp.h>
#include <stdio.h>
int main(int argc, char** argv){
  printf("Devices: %i\n", omp_get_num_devices());
  int a[10]={0};
  for (int i=0; i<= omp_get_num_devices(); i++)
  #pragma omp target device(i) map(tofrom: a[:10])
  {
        printf("Hello from device %i, is_initial_device=%i\n", i, omp_is_initial_device());
        a[i]++;
  }
  printf("%i, %i, %i, %i\n", a[0], a[1], a[2], a[3]);
  return 0;
}

Building like:

clang -fopenmp -fopenmp-targets=nvptx64 omp_hello_device.c -O3

This fails with

nvlink error   : Size doesn't match for '__omp_rtl_device_environment' in '/tmp/omp_hello_device-1f1427-nvptx64-nvidia-cuda-sm_70-1a1851.cubin', first specified in '/tmp/a-d3fd98.cubin'
nvlink fatal   : merge_elf failed
clang: error: fatbinary command failed with exit code 1 (use -v to see invocation)
clang-linker-wrapper: error: 'clang' failed
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I tried with a quite recent build from main (f7c2e5fa05). I tried with a release 17 build (6009708b). Both versions show this issue. I tried on different of our systems with different GPUs equipped, with different versions of CUDA (11.6/11.8/12.1.1). The result is consistent with clang 17 or newer. I tried a clang/16.0.6 build, which succeeds to build with any optimization level.

@jhuber6 did you see something like this before?

llvmbot commented 4 months ago

@llvm/issue-subscribers-openmp

Author: Joachim (jprotze)

Building an OpenMP hello world fails to link, when Optimization is turned on: ```c #include <omp.h> #include <stdio.h> int main(int argc, char** argv){ printf("Devices: %i\n", omp_get_num_devices()); int a[10]={0}; for (int i=0; i<= omp_get_num_devices(); i++) #pragma omp target device(i) map(tofrom: a[:10]) { printf("Hello from device %i, is_initial_device=%i\n", i, omp_is_initial_device()); a[i]++; } printf("%i, %i, %i, %i\n", a[0], a[1], a[2], a[3]); return 0; } ``` Building like: ``` clang -fopenmp -fopenmp-targets=nvptx64 omp_hello_device.c -O3 ``` This fails with ``` nvlink error : Size doesn't match for '__omp_rtl_device_environment' in '/tmp/omp_hello_device-1f1427-nvptx64-nvidia-cuda-sm_70-1a1851.cubin', first specified in '/tmp/a-d3fd98.cubin' nvlink fatal : merge_elf failed clang: error: fatbinary command failed with exit code 1 (use -v to see invocation) clang-linker-wrapper: error: 'clang' failed clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` I tried with a quite recent build from main (f7c2e5fa05). I tried with a release 17 build (6009708b). Both versions show this issue. I tried on different of our systems with different GPUs equipped, with different versions of CUDA (11.6/11.8/12.1.1). The result is consistent with clang 17 or newer. I tried a clang/16.0.6 build, which succeeds to build with any optimization level. @jhuber6 did you see something like this before?
jhuber6 commented 4 months ago

I can't reproduce with your test and compiler invocation unfortunately. I don't exactly know what's going on in your example. The non-LTO build of NVPTX targets should use -mlink-builtin-bitcode to resolve the OpenMP runtime per-TU. This is silently broken for anything that shares state between files, but it should result in having only a single .cubin, so I'm struggling to figure out where the other one is coming from.