Open richardfoltyn opened 2 months ago
Sorry, I missed that this had been assigned to me. Is this still a problem?
Hi,
sorry for taking a while to get back on this.
I realized I can get around the particular issue of missing device files by creating the symlink
ls -la /usr/amdgcn
lrwxrwxrwx. 1 root root 19 Sep 29 12:42 /usr/amdgcn -> lib/clang/17/amdgcn
However, now the build fails at a later stage:
BUILD_DIR=~/build/rocm
mkdir -p ${BUILD_DIR}
cd ${BUILD_DIR}
git clone -b rocm-jaxlib-v0.4.31 https://github.com/ROCm/jax.git
git clone -b rocm-jaxlib-v0.4.31 https://github.com/ROCm/xla.git
cd jax
python3 ./build/build.py --clang_path=/usr/bin/clang-17 --enable_rocm --rocm_amdgpu_targets=gfx1100 --build_gpu_plugin --gpu_plugin_rocm_version=60 --bazel_options=--override_repository=xla=${BUILD_DIR}/xla --rocm_path=/usr --enable_mkl_dnn=false
I also tried with the main branch and the default XLA, but that causes a different error which seems to be related to building zlib. build-jax-main.log
Thanks!
@richardfoltyn thanks for notifying us the issue.
Actually, we are working on clang patch. Meanwhile, can you try compiling like ->
rm -rf dist; python3.11 -m pip uninstall jax jaxlib jax-rocm60-pjrt jax-rocm60-plugin -y; python3.11 ./build/build.py --use_clang=false --enable_rocm --build_gpu_plugin --gpu_plugin_rocm_version=60 --rocm_amdgpu_targets=[gfxXXX] --bazel_options=--override_repository=xla=[xla_dir] --rocm_path=/opt/rocm-6.2.1/ && python3.11 setup.py develop --user && python3.11 -m pip install dist/*.whl
Hi @Ruturaj4 ,
It does not seem to make a difference whether I use clang or not. Running the command you suggested (adapted for the Fedora 40 setup),
BUILD_DIR=~/build/rocm
mkdir -p ${BUILD_DIR}
cd ${BUILD_DIR}
git clone -b rocm-jaxlib-v0.4.33 https://github.com/ROCm/jax.git
git clone -b rocm-jaxlib-v0.4.33 https://github.com/ROCm/xla.git
cd jax
python3.11 ./build/build.py --use_clang=false --enable_rocm --build_gpu_plugin --gpu_plugin_rocm_version=60 --rocm_amdgpu_targets=gfx1100 --bazel_options=--override_repository=xla=/home/richard/build/rocm/xla --rocm_path=/usr
produces the same error, see jax-build-no-clang.txt
I know it works on the Ubuntu-based Docker container that AMD/ROCm provide, so it's probably something specific to Fedora.
As you probably know, Fedora ships ROCm-6.1.2 directly in their repos and everything is installed right into /usr
as opposed to
/opt/rocm-6.x.y
@richardfoltyn hmm. I don't have Fedora container to reproduce this error. Can you add -> lib/clang/17/include
^^ This above include here -> https://github.com/openxla/xla/blob/9e28b002070276a852de6b5508224d35d2547d51/third_party/tsl/third_party/gpus/rocm_configure.bzl#L210
And check if it compiles?
Description
I'm trying to build jaxlib-0.4.30 from source on Fedora 40 using ROCm 6.0.2 that comes in their standard repositories.
Fedora dumps all ROCm libraries/headers directly into
/usr
, and these seem to be found correctly. However, the build fails because the ROCm device libraries are not found, which I suspect is the stuff installed in/usr/lib/clang/17/amdgcn/bitcode
:Running
I get the following error:
Is there some way to specify
--rocm-device-lib-path
for the build? I am unfortunately completely unfamiliar with bazel and don't even know where to start looking.Thanks!
System info (python version, jaxlib version, accelerator, etc.)