Open benkirk opened 2 months ago
You have to put the cuda version and cudnn version unfortunaly. Clang not detect automatically. If you are using this setup. Maybe is better that you use JAX Toolbox. https://github.com/NVIDIA/JAX-Toolbox
Hi @benkirk I'm going to update JAX docs with the link to XLA instructions.
From your command, I see that you provided environment variables:
--bazel_options=--repo_env=LOCAL_CUDA_PATH="${CUDA_HOME}" \
--bazel_options=--repo_env=LOCAL_CUDNN_PATH="${NCAR_ROOT_CUDNN}" \
--bazel_options=--repo_env=LOCAL_NCCL_PATH="${PREFIX}"
Would you provide values of ${CUDA_HOME}, ${NCAR_ROOT_CUDNN} and ${PREFIX} here please?
Hi @benkirk I'm going to update JAX docs with the link to XLA instructions.
From your command, I see that you provided environment variables:
--bazel_options=--repo_env=LOCAL_CUDA_PATH="${CUDA_HOME}" \ --bazel_options=--repo_env=LOCAL_CUDNN_PATH="${NCAR_ROOT_CUDNN}" \ --bazel_options=--repo_env=LOCAL_NCCL_PATH="${PREFIX}"
Would you provide values of ${CUDA_HOME}, ${NCAR_ROOT_CUDNN} and ${PREFIX} here please?
Hi @benkirk I'm going to update JAX docs with the link to XLA instructions.
From your command, I see that you provided environment variables:
--bazel_options=--repo_env=LOCAL_CUDA_PATH="${CUDA_HOME}" \ --bazel_options=--repo_env=LOCAL_CUDNN_PATH="${NCAR_ROOT_CUDNN}" \ --bazel_options=--repo_env=LOCAL_NCCL_PATH="${PREFIX}"
Would you provide values of ${CUDA_HOME}, ${NCAR_ROOT_CUDNN} and ${PREFIX} here please?
the problem is here: https://github.com/openxla/xla/issues/16877
I avoid a lot of problems. https://github.com/dusty-nv/jetson-containers/pull/626
Also, this is necessary: https://github.com/NVIDIA/JAX-Toolbox/blob/main/.github/container/install-cudnn.sh and this: https://github.com/NVIDIA/JAX-Toolbox/blob/main/.github/container/build-jax.sh
ln -s /usr/local/cuda/lib64 /usr/local/cuda/lib
Also, this is necessary: https://github.com/NVIDIA/JAX-Toolbox/blob/main/.github/container/install-cudnn.sh and this: https://github.com/NVIDIA/JAX-Toolbox/blob/main/.github/container/build-jax.sh
ln -s /usr/local/cuda/lib64 /usr/local/cuda/lib
I've update the script to not download the files.
#!/bin/bash
set -e
CUDNN_MAJOR_VERSION=9
CUDA_MAJOR_VERSION=12.2
prefix=/opt/nvidia/cudnn
arch=$(uname -m)-linux-gnu
cuda_base_path="/usr/local/cuda-${CUDA_MAJOR_VERSION}"
# Comprobar si la ruta especificada de CUDA existe
if [[ -d "${cuda_base_path}" ]]; then
cuda_lib_path="${cuda_base_path}/lib64"
output_path="/usr/local/cuda-${CUDA_MAJOR_VERSION}/lib"
else
cuda_lib_path="/usr/local/cuda/lib64"
output_path="/usr/local/cuda/lib64"
fi
# Crear enlace simbólico para CUDA
sudo ln -s "${cuda_lib_path}" "${output_path}"
# Proceso para CUDNN
for cudnn_file in $(dpkg -L libcudnn${CUDNN_MAJOR_VERSION} libcudnn${CUDNN_MAJOR_VERSION}-dev | sort -u); do
if [[ -f "${cudnn_file}" || -h "${cudnn_file}" ]]; then
nosysprefix="${cudnn_file#"/usr/"}"
noarchinclude="${nosysprefix/#"include/${arch}"/include}"
noverheader="${noarchinclude/%"_v${CUDNN_MAJOR_VERSION}.h"/.h}"
noarchlib="${noverheader/#"lib/${arch}"/lib}"
# Usar la ruta cuda_base_path o /usr/local/cuda/lib64
if [[ -d "${cuda_base_path}" ]]; then
link_name="${cuda_base_path}/${noarchlib}"
else
link_name="/usr/local/cuda/lib64/${noarchlib}"
fi
link_dir=$(dirname "${link_name}")
mkdir -p "${link_dir}"
ln -s "${cudnn_file}" "${link_name}"
fi
done
Thank you both, in my case
--bazel_options=--repo_env=LOCAL_CUDA_PATH="/glade/u/apps/common/23.08/spack/opt/spack/cuda/12.2.1" \
--bazel_options=--repo_env=LOCAL_CUDNN_PATH="/glade/u/apps/common/23.08/spack/opt/spack/cudnn/9.2.0.82-12" \
--bazel_options=--repo_env=LOCAL_NCCL_PATH="<my_conda_build_prefix>"
I'll attempt providing the version strings on the command line as well and follow XLA instructions.
Building from source without a container definitely wasn't my first choice, but we do have need for a site-provided NCCL on this machine, it has a proprietary vendor network - Slingshot 11 - that needs some care & feeding.
Thank you both, in my case
--bazel_options=--repo_env=LOCAL_CUDA_PATH="/glade/u/apps/common/23.08/spack/opt/spack/cuda/12.2.1" \ --bazel_options=--repo_env=LOCAL_CUDNN_PATH="/glade/u/apps/common/23.08/spack/opt/spack/cudnn/9.2.0.82-12" \ --bazel_options=--repo_env=LOCAL_NCCL_PATH="<my_conda_build_prefix>"
I'll attempt providing the version strings on the command line as well and follow XLA instructions.
Building from source without a container definitely wasn't my first choice, but we do have need for a site-provided NCCL on this machine, it has a proprietary vendor network - Slingshot 11 - that needs some care & feeding.
yeah but not works, because I mention before that cuda needs lib not lib64. And cudnn needs to be renamed mainting certain structure. It's very tricky. On 0.4.31 release, it was with cuda_path etc that was easier, but now, jax use xla hermetic cuda that runs automatically everything....
@benkirk You don't need to build JAX from source to use a custom NCCL. We'll use whichever libnccl.so
we find in your LD_LIBRARY_PATH
.
Thanks @hawkinsp, I've got my NCCL injected with jax[cuda12]=0.4.31
properly from PIP, had a few issues trying jax[cuda12_local]=0.4.31
; I'll revisit that as an alternative parallel path.
yeah but not works, because I mention before that cuda needs lib not lib64. And cudnn needs to be renamed mainting certain structure. It's very tricky. On 0.4.31 release, it was with cuda_path etc that was easier, but now, jax use xla hermetic cuda that runs automatically everything....
hi @johnnynunez, I understand your concerns, I tried to address them in the comment here.
Description
I'm attempting to build
jaxlib
with a local CUDA, CUDNN, and NCCL. I'm running into (different) issues with eithergcc
ofclang
. Any ideas??:Build command:
clang
error:gcc
error:System info (python version, jaxlib version, accelerator, etc.)