xuhuisheng / rocm-build

build scripts for ROCm
Apache License 2.0
181 stars 35 forks source link

28.rccl.sh fails to build for navi10 #44

Open TyraVex opened 1 year ago

TyraVex commented 1 year ago

Environment

Hardware description
GPU RX 5700
CPU Ryzen 5 3600
Software version
OS Ubuntu 20.04.6 LTS
ROCm 5.4.x
Python 3.8.10

What is the expected behavior

Build rccl for navi10

What actually happens


|====|
|SLOW|
|====|
/home/tyra/rocm/rocm-build/build/rccl /home/tyra/rocm/rocm-build/build/rccl
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- hip::amdhip64 is SHARED_LIBRARY
-- HIP compiler: clang
-- HIP runtime: rocclr
-- Found rocm_smi at /opt/rocm/include
RPM version 4.14.2.1
-- rocm-cmake: Set license file to /home/tyra/rocm/ROCm/rccl/LICENSE.txt.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tyra/rocm/rocm-build/build/rccl
[1/4] Updating git_version.cpp if necessary
-- Updating git_version.cpp
[2/4] Building CXX object CMakeFiles/rccl.dir/git_version.cpp.o
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
[3/4] Linking CXX shared library librccl.so.1.0.50400
FAILED: librccl.so.1.0.50400 
: && /opt/rocm/bin/hipcc -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,librccl.so.1 -o librccl.so.1.0.50400 CMakeFiles/rccl.dir/src/collectives/device/all_reduce.cpp.o CMakeFiles/rccl.dir/src/collectives/device/all_gather.cpp.o CMakeFiles/rccl.dir/src/collectives/device/alltoall_pivot.cpp.o CMakeFiles/rccl.dir/src/collectives/device/reduce.cpp.o CMakeFiles/rccl.dir/src/collectives/device/broadcast.cpp.o CMakeFiles/rccl.dir/src/collectives/device/reduce_scatter.cpp.o CMakeFiles/rccl.dir/src/collectives/device/sendrecv.cpp.o CMakeFiles/rccl.dir/src/collectives/device/onerank_reduce.cpp.o CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o CMakeFiles/rccl.dir/src/init.cc.o CMakeFiles/rccl.dir/src/graph/trees.cc.o CMakeFiles/rccl.dir/src/graph/rings.cc.o CMakeFiles/rccl.dir/src/graph/paths.cc.o CMakeFiles/rccl.dir/src/graph/search.cc.o CMakeFiles/rccl.dir/src/graph/connect.cc.o CMakeFiles/rccl.dir/src/graph/tuning.cc.o CMakeFiles/rccl.dir/src/graph/topo.cc.o CMakeFiles/rccl.dir/src/graph/xml.cc.o CMakeFiles/rccl.dir/src/graph/rome_models.cc.o CMakeFiles/rccl.dir/src/collectives/all_reduce_api.cc.o CMakeFiles/rccl.dir/src/collectives/all_gather_api.cc.o CMakeFiles/rccl.dir/src/collectives/reduce_api.cc.o CMakeFiles/rccl.dir/src/collectives/broadcast_api.cc.o CMakeFiles/rccl.dir/src/collectives/reduce_scatter_api.cc.o CMakeFiles/rccl.dir/src/collectives/sendrecv_api.cc.o CMakeFiles/rccl.dir/src/collectives/gather_api.cc.o CMakeFiles/rccl.dir/src/collectives/scatter_api.cc.o CMakeFiles/rccl.dir/src/collectives/all_to_all_api.cc.o CMakeFiles/rccl.dir/src/collectives/all_to_allv_api.cc.o CMakeFiles/rccl.dir/src/channel.cc.o CMakeFiles/rccl.dir/src/misc/argcheck.cc.o CMakeFiles/rccl.dir/src/misc/nvmlwrap_stub.cc.o CMakeFiles/rccl.dir/src/misc/utils.cc.o CMakeFiles/rccl.dir/src/misc/ibvwrap.cc.o CMakeFiles/rccl.dir/src/misc/rocm_smi_wrap.cc.o CMakeFiles/rccl.dir/src/misc/profiler.cc.o CMakeFiles/rccl.dir/src/misc/npkit.cc.o CMakeFiles/rccl.dir/src/misc/shmutils.cc.o CMakeFiles/rccl.dir/src/misc/signals.cc.o CMakeFiles/rccl.dir/src/misc/socket.cc.o CMakeFiles/rccl.dir/src/misc/param.cc.o CMakeFiles/rccl.dir/src/misc/rocmwrap.cc.o CMakeFiles/rccl.dir/src/misc/strongstream.cc.o CMakeFiles/rccl.dir/src/transport/coll_net.cc.o CMakeFiles/rccl.dir/src/transport/net.cc.o CMakeFiles/rccl.dir/src/transport/net_ib.cc.o CMakeFiles/rccl.dir/src/transport/net_socket.cc.o CMakeFiles/rccl.dir/src/transport/p2p.cc.o CMakeFiles/rccl.dir/src/transport/shm.cc.o CMakeFiles/rccl.dir/src/transport.cc.o CMakeFiles/rccl.dir/src/debug.cc.o CMakeFiles/rccl.dir/src/group.cc.o CMakeFiles/rccl.dir/src/bootstrap.cc.o CMakeFiles/rccl.dir/src/proxy.cc.o CMakeFiles/rccl.dir/src/net.cc.o CMakeFiles/rccl.dir/src/enqueue.cc.o CMakeFiles/rccl.dir/git_version.cpp.o  --amdgpu-target=gfx1010  -fgpu-rdc  -parallel-jobs=8  -ldl  -lrocm_smi64  -L/opt/rocm/lib  /opt/rocm/lib/libamdhip64.so.5.4.50100  --hip-link  --offload-arch=gfx1010  /opt/rocm/llvm/lib/clang/15.0.0/lib/linux/libclang_rt.builtins-x86_64.a && :
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: ld-temp.o <inline asm>:1:2: instruction not supported on this GPU
        buffer_wbinvl1_vol
        ^

lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors)
clang-15: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

How to reproduce

====== CONFIG ======

export ROCM_INSTALL_DIR=/opt/rocm
export ROCM_MAJOR_VERSION=5
export ROCM_MINOR_VERSION=4
export ROCM_PATCH_VERSION=0
export ROCM_LIBPATCH_VERSION=50400
export CPACK_DEBIAN_PACKAGE_RELEASE=72~20.04
export ROCM_PKGTYPE=DEB
export ROCM_GIT_DIR=/home/tyra/rocm/ROCm
export ROCM_BUILD_DIR=/home/tyra/rocm/rocm-build/build
export ROCM_PATCH_DIR=/home/tyra/rocm/rocm-build/patch
export AMDGPU_TARGETS="gfx1010"
export CMAKE_DIR=/home/tyra/rocm/cmake
export PATH=$ROCM_INSTALL_DIR/bin:$ROCM_INSTALL_DIR/llvm/bin:$ROCM_INSTALL_DIR/hip/bin:$CMAKE_DIR/bin:$PATH

====================

Build script I use

#!/bin/bash

if [ "$EUID" -ne 0 ]; then sudo bash "$0" "$@"; exit; fi
[ "$1" = clean ] && sudo rm -rf rocm-build/ venv/ ROCm/ cmake/ repo && exit

for package in "git" "git-lfs" "python3" "python3-venv" "python-is-python3" "wget"; do
  if ! dpkg -s "$package" &> /dev/null; then
    echo "Installing ..."
    apt install -y "$package"
  fi
done

[ -d rocm-build ] || git clone "https://github.com/xuhuisheng/rocm-build"
[ -d venv ] || python3 -m venv venv --system-site-packages
[ -x repo ] || wget --progress=bar:force "https://storage.googleapis.com/git-repo-downloads/repo" && chmod +x repo

if [ ! -d cmake ]; then
  wget "https://cmake.org/files/v3.18/cmake-3.18.6-Linux-x86_64.tar.gz"
  tar -xf cmake-3.18.6-Linux-x86_64.tar.gz
  mv cmake-3.18.6-Linux-x86_64 cmake
  rm -rf cmake-3.18.6-Linux-x86_64.tar.gz
fi

if [ ! -d ROCm ]; then
  mkdir -p ROCm
  cd ROCm
  git config --global user.email "$USER@mail.com"
  git config --global user.name "$USER"
  git config --global color.ui false
  ../repo init -u "https://github.com/RadeonOpenCompute/ROCm.git" -b roc-5.4.x
  ../repo sync
  cd ..
fi

cd rocm-build
config=$(cat "env.sh" | sed "s:/home/work/local/cmake-3.18.6-Linux-x86_64:$(readlink -f ../cmake):g" | sed "s:/home/work:$(readlink -f ..):g" | sed "s:gfx803:gfx1010:g")
echo -e "\n====== CONFIG ======\n\n\e[34m$(tail -n+3 <<< $config)\e[0m\n\n===================="
echo "$config" > .config; source .config
source ../venv/bin/activate
progress_file=".progress"

if [ -f "$progress_file" ]; then
  if [ "$1" = "startover" ]; then
    rm "$progress_file"
    checkpoint_index=0
  else
    checkpoint_index=$(<"$progress_file")
  fi
else
  checkpoint_index=0
fi

readarray -t scripts <<< $(ls -1 | sort -n | grep .sh | tail -n+3)
scripts=(00.rocm-core.sh "${scripts[@]}")

for i in "${!scripts[@]}"; do
  [ $i -lt $checkpoint_index ] && continue
  line="${scripts[$i]}"
  script_name="${line##*/}"
  navi_script="navi10/$script_name"
  [ -f "$navi_script" ] && scripts[$i]="$navi_script"
done

for i in "${!scripts[@]}"; do
  [ $i -lt $checkpoint_index ] && continue
  line="${scripts[$i]##*/}"
  cd $(dirname "$line")
  read -p $'\e[32m\n\n'"Press ENTER to execute $line"$'\e[0m '
  echo; echo; bash "$line"
  echo "$i" > "$progress_file"
  cd - >/dev/null
done

echo -e "\n\n\e[34m====== Finished ======\e[0m\n\n"

Any ideas of flags I could use/modify? I couldn't find any relevant google results regarding these errors.

serhii-nakon commented 1 year ago

Hello I have exactly the same issue https://github.com/xuhuisheng/rocm-build/issues/45

serhii-nakon commented 1 year ago

For gfx1012 I fixed this issue using this patch (try replace gfx1012 with gfx1010) and apply this patch inside ROCm/rccl directory

rccl_patch.zip