openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

Building with ROCm/HIP fails on a system without GPU #969

Open lahwaacz opened 2 months ago

lahwaacz commented 2 months ago

The cuda_lt.sh script contains a --offload-arch=native flag for amdclang:

https://github.com/openucx/ucc/blob/c1734db1b2bc9ffeba5d17b3e81e1a9425dee100/cuda_lt.sh#L31

This should select the native architecture of the GPU present in the build system. However, if the build system does not have any GPU, the command fails:

$ /opt/rocm/lib/llvm/bin/amdclang -c -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ec_rocm_executor_kernel.cu -I/usr/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/llvm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src/components/ec/rocm -fPIC -O3 -o ./.libs/ec_rocm_executor_kernel.o
/opt/rocm/lib/llvm/bin/amdclang -c -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ec_rocm_reduce.cu -I/usr/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/llvm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src/components/ec/rocm -fPIC -O3 -o ./.libs/ec_rocm_reduce.o
clang: error: cannot determine amdgcn architecture: /opt/rocm/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'
clang: error: cannot determine amdgcn architecture: /opt/rocm/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'
edgargabriel commented 2 months ago

@lahwaacz thank you for the bug report, we will look into this. The UCC CI checker runs through exactly the same scenario (i.e. compiling UCC with the ROCm stack installed but without an AMD GPU being available). However, because of an issue that we faced with clang-tidy newer than clang-12, we fixed the ROCm version in the UCC CI to ROCm 5.7.1 - which still uses hipcc to compile the kernels vs. the new clang --offload-arch=... approach that we use with ROCm > 6.0.

For now, I think your best options are either to compile on a platform with an AMD GPU present, or change the cuda_lt.sh file and remove the --offload-arch=native argument.

lahwaacz commented 2 months ago

@edgargabriel Thanks, I've patched it for the Arch Linux package: https://gitlab.archlinux.org/archlinux/packaging/packages/openucc/-/commit/f5618b46d08fa2c41f218366871d17133145cde9#9b9baac1eb9b72790eef5540a1685306fc43fd6c_50_42

romintomasetti commented 1 month ago

Hi @edgargabriel !

We also encountered the same issue (cannot determine amdgcn architecture), since we're building ucc from inside a docker build step (such that devices like GPUs are not exposed).

It would be nice that we can provide the list of architectures that we want to compile for at the configuration step. For now, there are many ROCm architectures listed in cuda_lt.sh, but we only need a few of them (not to mention those not listed). An option like --offload-arch=A,B,C would be welcome. It could default to what's already in cuda_lt.sh if not provided, for backward compatibility. The same remark can be made for the enabled CUDA architectures. It would be nice if only a chosen subset could be passed when compiling ucc (it would help us reduce the compile time and size).

Note that we circumvented the problem by patching cuda_lt.sh to remove the native offloading.

edgargabriel commented 1 month ago

@romintomasetti thank you, it is on our list, we definitely plan to have it fixed for the next release. I think the fix is not entirely trivial since cuda_lt.sh is used by both cuda and rocm component, so it might require a deeper rework of that part of the code section.

romintomasetti commented 4 weeks ago

Also, please note that there is already the option --with-nvcc-gencode for CUDA: https://github.com/openucx/ucc/blob/1522ccff5e107451d747b1085b3f84714a6c2eea/config/m4/cuda.m4#L109-L118