rapidsai / ucxx

BSD 3-Clause "New" or "Revised" License
20 stars 27 forks source link

11.x builds failing on system without `nvcc` #297

Open charlesbluca opened 1 month ago

charlesbluca commented 1 month ago

When attempting to build UCXX with the CUDA 11.8 conda environment on a system without nvcc pre-installed (i.e. all CTK components being installed through conda), I get the following error at build configuration:

CMake Error at /home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/cmake/rmm/rmm-targets.cmake:61 (set_target_properties):
  The link interface of target "rmm::rmm" contains:

    CUDA::cudart

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/cmake/rmm/rmm-config.cmake:75 (include)
  build/cmake/CPM_0.40.0.cmake:249 (find_package)
  build/cmake/CPM_0.40.0.cmake:303 (cpm_find_package)
  build/_deps/rapids-cmake-src/rapids-cmake/cpm/find.cmake:189 (CPMFindPackage)
  build/_deps/rapids-cmake-src/rapids-cmake/cpm/rmm.cmake:75 (rapids_cpm_find)
  cmake/thirdparty/get_rmm.cmake:20 (rapids_cpm_rmm)
  cmake/thirdparty/get_rmm.cmake:24 (find_and_configure_rmm)
  CMakeLists.txt:112 (include)

This was somewhat confusing, as the conda install itself raised a warning message that implied I should have libcudart in the conda environment:

To enable CUDA support, UCX requires the CUDA Runtime library (libcudart).
The library can be installed with the appropriate command below:

* For CUDA 11, run:    conda install cudatoolkit cuda-version=11
* For CUDA 12, run:    conda install cuda-cudart cuda-version=12
→ conda list cuda
# packages in environment at /home/charlesb/miniforge3/envs/ucxx-cuda-118:
#
# Name                    Version                   Build  Channel
cuda-version              11.8                 h70ddcb2_3    conda-forge
cudatoolkit               11.8.0              h4ba93d1_13    conda-forge
→ find $CONDA_PREFIX -name "libcudart.so*"
/home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/libcudart.so
/home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/libcudart.so.11.0
/home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/libcudart.so.11.8.89

Saw that these failures were coming up in the configuration of RMM, so tried building that with its accompanying 11.8 conda environment and got a somewhat clearer error that it was unable to find an installation of CTK on my system (nvcc bin was missing):

/home/charlesb/miniforge3/envs/rmm-cuda-118/bin/nvcc: line 9: /bin/nvcc: No such file or directory
...
CMake Error at /home/charlesb/miniforge3/envs/rmm-cuda-118/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find CUDAToolkit (missing: CUDAToolkit_INCLUDE_DIRECTORIES)
Call Stack (most recent call first):
  /home/charlesb/miniforge3/envs/rmm-cuda-118/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /home/charlesb/miniforge3/envs/rmm-cuda-118/share/cmake-3.30/Modules/FindCUDAToolkit.cmake:1048 (find_package_handle_standard_args)
  build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:125 (find_package)
  CMakeLists.txt:62 (rapids_find_package)

Was unable to reproduce this with the 12.5 environment, which does pull a conda installation of nvcc.

About to do a system installation of CTK on the system to see if this unblocks

pentschev commented 1 month ago

As far as I remember, you need to install nvcc_linux-64=11.8, could you check if that works?

charlesbluca commented 1 month ago

Installing that, it seems like the resulting nvcc bin is just wrapping what I assume is a system installation of nvcc?

→ conda list nvcc
# packages in environment at /home/charlesb/miniforge3/envs/ucxx-cuda-118:
#
# Name                    Version                   Build  Channel
nvcc_linux-64             11.8                h9852d18_24    conda-forge

→ nvcc
/home/charlesb/miniforge3/envs/ucxx-cuda-118/bin/nvcc: line 9: /bin/nvcc: No such file or directory

→ cat /home/charlesb/miniforge3/envs/ucxx-cuda-118/bin/nvcc
#!/bin/bash

for arg in "${@}" ; do
  case ${arg} in -ccbin)
    # If -ccbin argument is already provided, don't add an additional one.
    exec "${CUDA_HOME}/bin/nvcc" "${@}"
  esac
done
exec "${CUDA_HOME}/bin/nvcc" -ccbin "${CXX}" "${@}"
pentschev commented 1 month ago

IIRC, with CUDA 11.x CUDA_HOME is redefined during conda activate. Can you check if deactivating and reactivating your environment changes the behavior?

pentschev commented 1 month ago

By "redefined" I mean it should be redefined to $CONDA_PREFIX.

charlesbluca commented 1 month ago

Ah thanks for that tip - this highlights what I assume is an underlying issue here, in that we aren't able to locate the CUDA_HOME during environment activation:

→ conda activate ucxx-cuda-118
Cannot determine CUDA_HOME: cuda-gdb not in PATH

This warning specifically starts popping up with the installation of nvcc_linux-64 in the environment

pentschev commented 1 month ago

I had this discussion with @robertmaynard in the past, his answer was:

conda nvcc scrip uses cuda-gdb to determine the cuda install location if CUDA_HOME hasn't been explicitly set beforehand so if the machine doesn't have cuda-gdb the conda activation scripts will fail to setup CUDA_HOME, which you will need to do manually

So yeah, I think you need a system install of CTK for CUDA 11.x to be able to compile.

charlesbluca commented 1 month ago

Thanks @pentschev, installed CTK 12.5 on my system (seemingly the oldest version available for ubuntu24.04 right now), and that unblocked builds.

Moving forward, can or should we explicitly encode a CTK dependency similar to what RMM is doing in its CMakeLists.txt?

https://github.com/rapidsai/rmm/blob/c494395e58288cac16321ce90e9b15f3508ae89a/CMakeLists.txt#L62-L65

Or is this too brittle of a solution, with just general documentation of system installing CTK for 11.x builds making more sense?

charlesbluca commented 1 month ago

Also worth noting that it seems like there's more required than just proper setting of CUDA_HOME here, as even manually setting it to the CONDA_PREFIX above that I can see contains libcudart seems to raise the same failures

robertmaynard commented 1 month ago

Also worth noting that it seems like there's more required than just proper setting of CUDA_HOME here, as even manually setting it to the CONDA_PREFIX above that I can see contains libcudart seems to raise the same failures

Can you try setting the env variable CUDA_PATH that is what is used by CMake ( not CUDA_HOME ).

charlesbluca commented 1 month ago

Thanks for the tip - looks like that's still failing. For reference the command I'm working with:

$ CUDA_PATH=/home/charlesb/miniforge3/envs/ucxx-cuda-118 ./build.sh
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/charlesb/miniforge3/envs/ucxx-cuda-118/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/charlesb/miniforge3/envs/ucxx-cuda-118/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- CPM: Using local package rmm@24.12.0
-- Configuring done (2.1s)
CMake Error at /home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/cmake/rmm/rmm-targets.cmake:61 (set_target_properties):
  The link interface of target "rmm::rmm" contains:

    CUDA::cudart

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /home/charlesb/miniforge3/envs/ucxx-cuda-118/lib/cmake/rmm/rmm-config.cmake:75 (include)
  build/cmake/CPM_0.40.0.cmake:249 (find_package)
  build/cmake/CPM_0.40.0.cmake:303 (cpm_find_package)
  build/_deps/rapids-cmake-src/rapids-cmake/cpm/find.cmake:189 (CPMFindPackage)
  build/_deps/rapids-cmake-src/rapids-cmake/cpm/rmm.cmake:75 (rapids_cpm_find)
  cmake/thirdparty/get_rmm.cmake:20 (rapids_cpm_rmm)
  cmake/thirdparty/get_rmm.cmake:24 (find_and_configure_rmm)
  CMakeLists.txt:112 (include)

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.

Output of conda list:

``` # packages in environment at /home/charlesb/miniforge3/envs/ucxx-cuda-118: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge attrs 24.2.0 pyh71513ae_0 conda-forge autoconf 2.71 pl5321h2b4cb7a_1 conda-forge automake 1.17 pl5321ha770c72_0 conda-forge aws-c-auth 0.7.31 h57bd9a3_0 conda-forge aws-c-cal 0.7.4 hfd43aa1_1 conda-forge aws-c-common 0.9.28 hb9d3cd8_0 conda-forge aws-c-compression 0.2.19 h756ea98_1 conda-forge aws-c-event-stream 0.4.3 h29ce20c_2 conda-forge aws-c-http 0.8.10 h5e77a74_0 conda-forge aws-c-io 0.14.18 h4e6ae90_11 conda-forge aws-c-mqtt 0.10.6 h02abb05_0 conda-forge aws-c-s3 0.6.6 h834ce55_0 conda-forge aws-c-sdkutils 0.1.19 h756ea98_3 conda-forge aws-checksums 0.1.20 h756ea98_0 conda-forge aws-crt-cpp 0.28.3 h469002c_5 conda-forge aws-sdk-cpp 1.11.407 h9f1560d_0 conda-forge azure-core-cpp 1.13.0 h935415a_0 conda-forge azure-identity-cpp 1.9.0 hd126650_0 conda-forge azure-storage-blobs-cpp 12.13.0 h1d30c4a_0 conda-forge azure-storage-common-cpp 12.8.0 ha3822c6_0 conda-forge azure-storage-files-datalake-cpp 12.12.0 h0f25b8a_0 conda-forge binutils 2.43 h4852527_1 conda-forge binutils_impl_linux-64 2.43 h4bf12b8_1 conda-forge binutils_linux-64 2.43 h4852527_1 conda-forge bokeh 3.5.2 pyhd8ed1ab_0 conda-forge brotli-python 1.1.0 py312h2ec8cdc_2 conda-forge bzip2 1.0.8 h4bc722e_7 conda-forge c-ares 1.33.1 heb4867d_0 conda-forge c-compiler 1.8.0 h2b85faf_0 conda-forge ca-certificates 2024.8.30 hbcca054_0 conda-forge cachetools 5.5.0 pyhd8ed1ab_0 conda-forge cffi 1.17.1 py312h06ac9bb_0 conda-forge cfgv 3.3.1 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge cmake 3.30.4 hf9cb763_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge contourpy 1.3.0 py312h68727a3_2 conda-forge cubinlinker 0.3.0 py312hbe86355_1 rapidsai cuda-python 11.8.3 py312h32b3722_2 conda-forge cuda-version 11.8 h70ddcb2_3 conda-forge cudatoolkit 11.8.0 h4ba93d1_13 conda-forge cudf 24.12.00a150 cuda11_py312_241007_gfcff2b6ef7_150 rapidsai-nightly cupy 13.3.0 py312h8e83189_0 conda-forge cupy-core 13.3.0 py312h53955ab_0 conda-forge cxx-compiler 1.8.0 h1a2810e_0 conda-forge cython 3.0.11 py312h8fd2918_3 conda-forge cytoolz 1.0.0 py312h66e93f0_0 conda-forge dask 2024.9.0 pyhd8ed1ab_0 conda-forge dask-core 2024.9.0 pyhd8ed1ab_0 conda-forge dask-cuda 24.12.00a2 py312_241007_gfe16796_2 rapidsai-nightly dask-cudf 24.12.00a150 cuda11_py312_241007_gfcff2b6ef7_150 rapidsai-nightly dask-expr 1.1.14 pyhd8ed1ab_0 conda-forge distlib 0.3.8 pyhd8ed1ab_0 conda-forge distributed 2024.9.0 pyhd8ed1ab_0 conda-forge dlpack 0.8 h59595ed_3 conda-forge doxygen 1.9.1 hb166930_1 conda-forge exceptiongroup 1.2.2 pyhd8ed1ab_0 conda-forge fastrlock 0.8.2 py312h30efb56_2 conda-forge filelock 3.16.1 pyhd8ed1ab_0 conda-forge fmt 11.0.2 h434a139_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge fsspec 2024.9.0 pyhff2d567_0 conda-forge gcc 13.3.0 h9576a4e_1 conda-forge gcc_impl_linux-64 13.3.0 hfea6d02_1 conda-forge gcc_linux-64 13.3.0 hc28eda2_4 conda-forge gflags 2.2.2 h5888daf_1005 conda-forge glog 0.7.1 hbabe93e_0 conda-forge gxx 13.3.0 h9576a4e_1 conda-forge gxx_impl_linux-64 13.3.0 hdbfa832_1 conda-forge gxx_linux-64 13.3.0 h6834431_4 conda-forge h2 4.1.0 pyhd8ed1ab_0 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge icu 75.1 he02047a_0 conda-forge identify 2.6.1 pyhd8ed1ab_0 conda-forge importlib-metadata 8.5.0 pyha770c72_0 conda-forge importlib-resources 6.4.5 pyhd8ed1ab_0 conda-forge importlib_metadata 8.5.0 hd8ed1ab_0 conda-forge importlib_resources 6.4.5 pyhd8ed1ab_0 conda-forge iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge jinja2 3.1.4 pyhd8ed1ab_0 conda-forge jsonschema 4.23.0 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2023.12.1 pyhd8ed1ab_0 conda-forge kernel-headers_linux-64 3.10.0 he073ed8_17 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge krb5 1.21.3 h659f571_0 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.43 h712a8e2_1 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20240722.0 cxx17_h5888daf_1 conda-forge libarrow 17.0.0 h364f349_19_cpu conda-forge libarrow-acero 17.0.0 h5888daf_19_cpu conda-forge libarrow-dataset 17.0.0 h5888daf_19_cpu conda-forge libarrow-substrait 17.0.0 he882d9a_19_cpu conda-forge libblas 3.9.0 24_linux64_openblas conda-forge libbrotlicommon 1.1.0 hb9d3cd8_2 conda-forge libbrotlidec 1.1.0 hb9d3cd8_2 conda-forge libbrotlienc 1.1.0 hb9d3cd8_2 conda-forge libcblas 3.9.0 24_linux64_openblas conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcudf 24.12.00a150 cuda11_241007_gfcff2b6ef7_150 rapidsai-nightly libcufile 1.4.0.31 0 nvidia libcufile-dev 1.4.0.31 0 nvidia libcurl 8.10.1 hbbe4b11_0 conda-forge libdeflate 1.22 hb9d3cd8_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.6.3 h5888daf_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc 14.1.0 h77fa898_1 conda-forge libgcc-devel_linux-64 13.3.0 h84ea5a7_101 conda-forge libgcc-ng 14.1.0 h69a702a_1 conda-forge libgfortran 14.1.0 h69a702a_1 conda-forge libgfortran-ng 14.1.0 h69a702a_1 conda-forge libgfortran5 14.1.0 hc5f4f2c_1 conda-forge libgomp 14.1.0 h77fa898_1 conda-forge libgoogle-cloud 2.29.0 h438788a_1 conda-forge libgoogle-cloud-storage 2.29.0 h0121fbd_1 conda-forge libgrpc 1.65.5 hf5c653b_0 conda-forge libiconv 1.17 hd590300_2 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge libkvikio 24.12.00a cuda11_241007_ge64c363_20 rapidsai-nightly liblapack 3.9.0 24_linux64_openblas conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnl 3.10.0 h4bc722e_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge libparquet 17.0.0 h6bd9018_19_cpu conda-forge libpng 1.6.44 hadc24fc_0 conda-forge libprotobuf 5.27.5 h5b01275_2 conda-forge libre2-11 2023.09.01 hbbce691_3 conda-forge librmm 24.12.00a9 cuda11_241007_gc494395e_9 rapidsai-nightly libsanitizer 13.3.0 heb74ff8_1 conda-forge libsqlite 3.46.1 hadc24fc_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx 14.1.0 hc0a3c3a_1 conda-forge libstdcxx-devel_linux-64 13.3.0 h84ea5a7_101 conda-forge libstdcxx-ng 14.1.0 h4852527_1 conda-forge libthrift 0.21.0 h0e7cc3e_0 conda-forge libtiff 4.7.0 he137b08_1 conda-forge libtool 2.4.7 he02047a_1 conda-forge libutf8proc 2.8.0 h166bdaf_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libuv 1.49.0 hb9d3cd8_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.17.0 h8a09558_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.12.7 he7c6b58_4 conda-forge libzlib 1.3.1 hb9d3cd8_2 conda-forge llvmlite 0.43.0 py312h374181b_1 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge lz4 4.3.3 py312hb3f7f12_1 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge m4 1.4.18 h516909a_1001 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.5 py312h66e93f0_1 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge msgpack-python 1.1.0 py312h68727a3_0 conda-forge ncurses 6.5 he02047a_1 conda-forge ninja 1.12.1 h297d8ca_0 conda-forge nodeenv 1.9.1 pyhd8ed1ab_0 conda-forge numba 0.60.0 py312h83e6fd3_0 conda-forge numba-cuda 0.0.15 pyh267e887_1 conda-forge numpy 2.0.2 py312h58c1407_0 conda-forge nvcomp 4.0.1 hee583db_0 conda-forge nvtx 0.2.10 py312h66e93f0_2 conda-forge openjpeg 2.5.2 h488ebb8_0 conda-forge openssl 3.3.2 hb9d3cd8_0 conda-forge orc 2.0.2 h690cf93_1 conda-forge packaging 24.1 pyhd8ed1ab_0 conda-forge pandas 2.2.3 py312hf9745cd_1 conda-forge partd 1.4.2 pyhd8ed1ab_0 conda-forge pathspec 0.12.1 pyhd8ed1ab_0 conda-forge perl 5.32.1 7_hd590300_perl5 conda-forge pillow 10.4.0 py312h56024de_1 conda-forge pip 24.2 pyh8b19718_1 conda-forge pkg-config 0.29.2 h4bc722e_1009 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 4.3.6 pyhd8ed1ab_0 conda-forge pluggy 1.5.0 pyhd8ed1ab_0 conda-forge pre-commit 4.0.0 pyha770c72_0 conda-forge psutil 6.0.0 py312h66e93f0_1 conda-forge pthread-stubs 0.4 hb9d3cd8_1002 conda-forge ptxcompiler 0.8.1 py312h32b3722_4 conda-forge pyarrow 17.0.0 py312h9cebb41_1 conda-forge pyarrow-core 17.0.0 py312h9cafe31_1_cpu conda-forge pyarrow-hotfix 0.6 pyhd8ed1ab_0 conda-forge pycparser 2.22 pyhd8ed1ab_0 conda-forge pygments 2.18.0 pyhd8ed1ab_0 conda-forge pylibcudf 24.12.00a150 cuda11_py312_241007_gfcff2b6ef7_150 rapidsai-nightly pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pytest 7.4.4 pyhd8ed1ab_0 conda-forge pytest-asyncio 0.23.8 pyhd8ed1ab_0 conda-forge pytest-rerunfailures 14.0 pyhd8ed1ab_0 conda-forge python 3.12.7 hc5c86c4_0_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-tzdata 2024.2 pyhd8ed1ab_0 conda-forge python_abi 3.12 5_cp312 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge pyyaml 6.0.2 py312h66e93f0_1 conda-forge rapids-build-backend 0.3.2 py_0 rapidsai rapids-dask-dependency 24.12.00a6 py_0 rapidsai-nightly rapids-dependency-file-generator 1.15.0 py_0 rapidsai rdma-core 54.0 h5888daf_0 conda-forge re2 2023.09.01 h77b4e00_3 conda-forge readline 8.2 h8228510_1 conda-forge referencing 0.35.1 pyhd8ed1ab_0 conda-forge rhash 1.4.4 hd590300_0 conda-forge rich 13.9.2 pyhd8ed1ab_0 conda-forge rmm 24.12.00a9 cuda11_py312_241007_gc494395e_9 rapidsai-nightly rpds-py 0.20.0 py312h12e396e_1 conda-forge s2n 1.5.4 h1380c3d_0 conda-forge scikit-build-core 0.10.7 pyh4afc917_0 conda-forge setuptools 75.1.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.2.1 ha2e4443_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.14.1 hed91bc2_1 conda-forge sysroot_linux-64 2.17 h4a8ded7_17 conda-forge tblib 3.0.0 pyhd8ed1ab_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.2 pyhd8ed1ab_0 conda-forge tomlkit 0.13.2 pyha770c72_0 conda-forge toolz 1.0.0 pyhd8ed1ab_0 conda-forge tornado 6.4.1 py312h66e93f0_1 conda-forge typing-extensions 4.12.2 hd8ed1ab_0 conda-forge typing_extensions 4.12.2 pyha770c72_0 conda-forge tzdata 2024b hc8b5060_0 conda-forge ucx 1.17.0 h0104b51_3 conda-forge ukkonen 1.0.1 py312h68727a3_5 conda-forge urllib3 2.2.3 pyhd8ed1ab_0 conda-forge virtualenv 20.26.6 pyhd8ed1ab_0 conda-forge wheel 0.44.0 pyhd8ed1ab_0 conda-forge xorg-libxau 1.0.11 hb9d3cd8_1 conda-forge xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge xyzservices 2024.9.0 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zict 3.0.0 pyhd8ed1ab_0 conda-forge zipp 3.20.2 pyhd8ed1ab_0 conda-forge zstandard 0.23.0 py312hef9b889_1 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge ```
robertmaynard commented 1 month ago

I would need a full trace log from CMake to see what is exactly going wrong.

IIRC the command line would be:

CUDA_PATH=/home/charlesb/miniforge3/envs/ucxx-cuda-118 ./build.sh --cmake-args=\"--trace\" > log
charlesbluca commented 1 month ago

Here's a log with CMake traces enabled:

failure.log

robertmaynard commented 1 month ago

Some clarification. The cuda-gdb detection logic is what conda uses to manage finding a local install of CUDA 11.X

CMake uses different logic for finding nvcc and from the extracting the rest of the CUDA Toolkit libraries and headers. @charlesbluca In the trace you provided the FindCUDAToolkit is failing since it can't find nvcc or the sentinel versions files inside the CUDA Toolkit.

I think the primary issue is that CUDA_PATH needs to point not to your conda env, but the local install of the cuda toolkit. E.g /usr/local/cuda-11.8/

pentschev commented 1 month ago

@charlesbluca do you think there's still anything we should do in UCXX for better UX?