intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.23k stars 736 forks source link

`-D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1` is passed as argument to clang-18 binary when targeting AMD GPUs #12010

Closed mdessole closed 4 weeks ago

mdessole commented 11 months ago

Bug Description

When compiling a simple example trying to target my AMD GPU, I get the following error:

clang++ vector_add.cpp -o vector_add -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90c
In file included from <built-in>:875:
<command line>:1:9: error: macro name must be an identifier
    1 | #define -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__ 1

When I try a different target, e.g. the NVIDIA GPU or the CPU, everything works fine.

To Reproduce

I installed the dpc++ compiler from source, targeting both my NVIDIA and AMD GPUs as follows:

python ./buildbot/configure.py  --cuda --hip --cmake-opt="-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.2/ -DSYCL_BUILD_PI_HIP_ROCM_DIR=/opt/rocm-5.7.0" 

In what follows tou can find the output of the compilation with the -### flag, where you can see that -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1 is passed as the argument to clang-18 binary.

clang version 18.0.0 (https://github.com/intel/llvm e5dd11182ea596e1523faddb260202a055f083bb)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/mdessole/Projects/OneAPI/build/bin
 "/home/mdessole/Projects/OneAPI/build/bin/clang-18" "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu" "-fsycl-is-device" "-fdeclare-spirv-builtins" "-Wno-sycl-strict" "-O2" "-fsycl-int-header=/tmp/vector_add-header-4af914.h" "-fsycl-int-footer=/tmp/vector_add-footer-a35078.h" "-sycl-std=2020" "-fsycl-unique-prefix=uid205cb051ac83eaea" "-D" "-D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1" "-emit-llvm-bc" "-emit-llvm-uselists" "-dumpdir" "vector_add-" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "vector_add.cpp" "-fsycl-use-main-file-name" "-full-main-file-name" "vector_add.cpp" "-mrelocation-model" "pic" "-pic-level" "2" "-fhalf-no-semantic-interposition" "-mframe-pointer=all" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-aux-target-cpu" "x86-64" "-fcuda-is-device" "-mllvm" "-amdgpu-internalize-symbols" "-fcuda-allow-variadic-functions" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl/stl_wrappers" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include" "-mlink-builtin-bitcode" "/home/mdessole/Projects/OneAPI/build/lib/clang/18/../../clc/remangled-l64-signed_char.libspirv-amdgcn-amd-amdhsa.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/hip.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/ocml.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/ockl.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_daz_opt_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_unsafe_math_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_finite_only_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_wavefrontsize64_on.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_isa_version_90c.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_abi_version_400.bc" "-target-cpu" "gfx90c" "-debugger-tuning=gdb" "-fdebug-compilation-dir=/home/mdessole/Projects/ROOT/parallelVec" "-resource-dir" "/home/mdessole/Projects/OneAPI/build/lib/clang/18" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl/stl_wrappers" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/backward" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/backward" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/lib/clang/18/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/lib/clang/18/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-std=c++17" "-fdeprecated-macro" "-ferror-limit" "19" "-fgpu-rdc" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "/tmp/vector_add-gfx90c-6e68ec.bc" "-x" "c++" "vector_add.cpp"
 "/home/mdessole/Projects/OneAPI/build/bin/llvm-link" "/tmp/vector_add-gfx90c-6e68ec.bc" "-o" "/tmp/vector_add-gfx90c-49caa6.bc" "--suppress-warnings"
 "/home/mdessole/Projects/OneAPI/build/bin/sycl-post-link" "-split=auto" "-symbols" "-emit-exported-symbols" "-lower-esimd" "-O2" "-spec-const=emulation" "-device-globals" "-o" "/tmp/vector_add-gfx90c-63a4d8.bc" "/tmp/vector_add-gfx90c-49caa6.bc"
 "/home/mdessole/Projects/OneAPI/build/bin/file-table-tform" "-extract=Code" "-drop_titles" "-o" "/tmp/vector_add-gfx90c-f22eb3.bc" "/tmp/vector_add-gfx90c-63a4d8.bc"
 "/home/mdessole/Projects/OneAPI/build/bin/llvm-foreach" "--out-ext=o" "--in-file-list=/tmp/vector_add-gfx90c-f22eb3.bc" "--in-replace=/tmp/vector_add-gfx90c-f22eb3.bc" "--out-file-list=/tmp/vector_add-gfx90c-26676e.o" "--out-replace=/tmp/vector_add-gfx90c-26676e.o" "--" "/home/mdessole/Projects/OneAPI/build/bin/clang-18" "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu" "-fsycl-is-device" "-fdeclare-spirv-builtins" "-Wno-sycl-strict" "-O2" "-fsycl-int-header=/tmp/vector_add-header-4af914.h" "-fsycl-int-footer=/tmp/vector_add-footer-a35078.h" "-sycl-std=2020" "-fsycl-unique-prefix=uid205cb051ac83eaea" "-D" "-D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1" "-emit-llvm-bc" "-dumpdir" "vector_add-" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "vector_add.cpp" "-fsycl-use-main-file-name" "-full-main-file-name" "vector_add.cpp" "-mrelocation-model" "pic" "-pic-level" "2" "-fhalf-no-semantic-interposition" "-mframe-pointer=all" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-aux-target-cpu" "x86-64" "-fcuda-is-device" "-mllvm" "-amdgpu-internalize-symbols" "-fcuda-allow-variadic-functions" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl/stl_wrappers" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include" "-mlink-builtin-bitcode" "/home/mdessole/Projects/OneAPI/build/lib/clang/18/../../clc/remangled-l64-signed_char.libspirv-amdgcn-amd-amdhsa.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/hip.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/ocml.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/ockl.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_daz_opt_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_unsafe_math_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_finite_only_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_wavefrontsize64_on.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_isa_version_90c.bc" "-mlink-builtin-bitcode" "/opt/rocm/amdgcn/bitcode/oclc_abi_version_400.bc" "-target-cpu" "gfx90c" "-debugger-tuning=gdb" "-fdebug-compilation-dir=/home/mdessole/Projects/ROOT/parallelVec" "-resource-dir" "/home/mdessole/Projects/OneAPI/build/lib/clang/18" "-ferror-limit" "19" "-fgpu-rdc" "-fgnuc-version=4.2.1" "-faddrsig" "-o" "/tmp/vector_add-gfx90c-26676e.o" "-x" "ir" "/tmp/vector_add-gfx90c-f22eb3.bc"
 "/home/mdessole/Projects/OneAPI/build/bin/llvm-foreach" "--out-ext=out" "--in-file-list=/tmp/vector_add-gfx90c-26676e.o" "--in-replace=/tmp/vector_add-gfx90c-26676e.o" "--out-file-list=/tmp/vector_add-gfx90c-fcc64b.out" "--out-replace=/tmp/vector_add-gfx90c-fcc64b.out" "--" "/home/mdessole/Projects/OneAPI/build/bin/lld" "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared" "-plugin-opt=-amdgpu-internalize-symbols" "-plugin-opt=mcpu=gfx90c" "--whole-archive" "-o" "/tmp/vector_add-gfx90c-fcc64b.out" "/tmp/vector_add-gfx90c-26676e.o" "--no-whole-archive"
 "/home/mdessole/Projects/OneAPI/build/bin/llvm-foreach" "--out-ext=hipfb" "--in-file-list=/tmp/vector_add-gfx90c-fcc64b.out" "--in-replace=/tmp/vector_add-gfx90c-fcc64b.out" "--out-file-list=/tmp/vector_add-gfx90c-294e3f.hipfb" "--out-replace=/tmp/vector_add-gfx90c-294e3f.hipfb" "--" "/home/mdessole/Projects/OneAPI/build/bin/clang-offload-bundler" "-type=o" "-bundle-align=4096" "-targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx90c" "-input=/dev/null" "-input=/tmp/vector_add-gfx90c-fcc64b.out" "-output=/tmp/vector_add-gfx90c-294e3f.hipfb"
 "/home/mdessole/Projects/OneAPI/build/bin/file-table-tform" "-replace=Code,Code" "-o" "/tmp/vector_add-gfx90c-592fef.table" "/tmp/vector_add-gfx90c-63a4d8.bc" "/tmp/vector_add-gfx90c-294e3f.hipfb"
 "/home/mdessole/Projects/OneAPI/build/bin/clang-offload-wrapper" "-o=/tmp/wrapper-cd3349.bc" "-host=x86_64-unknown-linux-gnu" "-compile-opts=--offload-arch=gfx90c" "-target=amdgcn" "-kind=sycl" "-batch" "/tmp/vector_add-gfx90c-592fef.table"
 "/home/mdessole/Projects/OneAPI/build/bin/llc" "-filetype=obj" "-o" "/tmp/vector_add-wrapper-gfx90c-7d19bd.o" "/tmp/wrapper-cd3349.bc" "-relocation-model=pic"
 "/home/mdessole/Projects/OneAPI/build/bin/append-file" "vector_add.cpp" "--append=/tmp/vector_add-footer-a35078.h" "--orig-filename=vector_add.cpp" "--output=/tmp/vector_add-145da9.cpp" "--use-include"
 "/home/mdessole/Projects/OneAPI/build/bin/clang-18" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-sycl-std=2020" "-fsycl-unique-prefix=uid205cb051ac83eaea" "-include" "/tmp/vector_add-header-4af914.h" "-dependency-filter" "/tmp/vector_add-header-4af914.h" "-fsycl-enable-int-header-diags" "-fsycl-is-host" "-D" "__SYCL_NATIVE_CPU__" "-D" "-D" "-D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1" "-emit-obj" "-mrelax-all" "-dumpdir" "vector_add-" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "vector_add.cpp" "-fsycl-use-main-file-name" "-full-main-file-name" "vector_add.cpp" "-mrelocation-model" "static" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "--dependent-lib=sycl-devicelib-host" "-debugger-tuning=gdb" "-fdebug-compilation-dir=/home/mdessole/Projects/ROOT/parallelVec" "-fcoverage-compilation-dir=/home/mdessole/Projects/ROOT/parallelVec" "-resource-dir" "/home/mdessole/Projects/OneAPI/build/lib/clang/18" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include/sycl/stl_wrappers" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/bin/../include" "-iquote" "/home/mdessole/Projects/ROOT/parallelVec" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/backward" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/backward" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/lib/clang/18/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/home/mdessole/Projects/OneAPI/build/lib/clang/18/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-std=c++17" "-fdeprecated-macro" "-ferror-limit" "19" "-fgpu-rdc" "-fgnuc-version=4.2.1" "-fcxx-exceptions" "-fexceptions" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "/tmp/vector_add-8c2da5.o" "-x" "c++" "/tmp/vector_add-145da9.cpp"
 "/usr/bin/ld" "-z" "relro" "--hash-style=gnu" "--eh-frame-hdr" "-m" "elf_x86_64" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "vector_add" "/lib/x86_64-linux-gnu/crt1.o" "/lib/x86_64-linux-gnu/crti.o" "/usr/lib/gcc/x86_64-linux-gnu/12/crtbegin.o" "-L/usr/lib/gcc/x86_64-linux-gnu/12" "-L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64" "-L/lib/x86_64-linux-gnu" "-L/lib/../lib64" "-L/usr/lib/x86_64-linux-gnu" "-L/usr/lib/../lib64" "-L/home/mdessole/Projects/OneAPI/build/bin/../lib" "-L/lib" "-L/usr/lib" "/tmp/vector_add-8c2da5.o" "/tmp/vector_add-wrapper-gfx90c-7d19bd.o" "-lstdc++" "-lm" "-lgcc_s" "-lgcc" "-lsycl" "-lsycl-devicelib-host" "-lc" "-lgcc_s" "-lgcc" "/usr/lib/gcc/x86_64-linux-gnu/12/crtend.o" "/lib/x86_64-linux-gnu/crtn.o"

The sycl-ls command correctly identifies the two GPUs:

[opencl:cpu:0] Intel(R) OpenCL, AMD Ryzen 7 5700G with Radeon Graphics          OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:acc:1] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.2]
[ext_oneapi_hip:gpu:0] AMD HIP BACKEND, AMD Radeon Graphics gfx90c:xnack- [HIP 50731.92]

Environment (please complete the following information):

al42and commented 11 months ago

Duplicate of #8112?

jinz2014 commented 11 months ago

@al42and Thank you for your suggestion.

jinz2014 commented 11 months ago

@mdessole If the results of comparing HIP and SYCL programs are available on gfx90c, thanks for letting people know.

mdessole commented 10 months ago

Sorry for answering so late.

Update

I reinstalled from source including changes in [SYCL] Gracefully handle unknown device, using rocm-5.4.3 instead of rocm-5.7.0, and clang++ compiles the example without raising any error. However, when I try to run the executable, the process just hangs forever. If I try to print out the device that holds the sycl queue before executing the kernel, I correctly get AMD Radeon Graphics Do you have any idea of what is happening, or can you suggest how to investigate it?

The code

I include the code snippet I'm trying to execute - a very simple example - for completeness.

#include <iostream>
#include <sycl/sycl.hpp>

class vector_addition;

int main(int, char**) {

   sycl::float4 a = { 1.0, 2.0, 3.0, 4.0 };
   sycl::float4 b = { 4.0, 3.0, 2.0, 1.0 };
   sycl::float4 c = { 0.0, 0.0, 0.0, 0.0 };

   sycl::default_selector device_selector;
   sycl::queue queue(device_selector);
   std::cout << "Running on "
             << queue.get_device().get_info<sycl::info::device::name>()
             << "\n";
   {
      sycl::buffer<sycl::float4, 1> a_sycl(&a, sycl::range<1>(1));
      sycl::buffer<sycl::float4, 1> b_sycl(&b, sycl::range<1>(1));
      sycl::buffer<sycl::float4, 1> c_sycl(&c, sycl::range<1>(1));

      queue.submit([&] (sycl::handler& cgh) {
         auto a_acc = a_sycl.get_access<sycl::access::mode::read>(cgh);
         auto b_acc = b_sycl.get_access<sycl::access::mode::read>(cgh);
         auto c_acc = c_sycl.get_access<sycl::access::mode::discard_write>(cgh);

         cgh.single_task<class vector_addition>([=] () {
         c_acc[0] = a_acc[0] + b_acc[0];
         });
      });
   }
   std::cout << "  A { " << a.x() << ", " << a.y() << ", " << a.z() << ", " << a.w() << " }\n"
        << "+ B { " << b.x() << ", " << b.y() << ", " << b.z() << ", " << b.w() << " }\n"
        << "------------------\n"
        << "= C { " << c.x() << ", " << c.y() << ", " << c.z() << ", " << c.w() << " }"
        << std::endl;

   return 0;
jinz2014 commented 10 months ago

@mdessole Would you please build the SYCL compiler from the latest source again ?

JackAKirk commented 4 weeks ago

gfx90c is not officially supported by rocm, and hence not officially supported by dpc++ amd backend. See https://github.com/ROCm/ROCm/issues/1743 for some discussion. It seems that things don't work well for this gpu. Closing.