rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.27k stars 884 forks source link

[BUG] fail to download rapids_cpm_generate_pinned_versions nvcomp version 4.0.1.0 from developer site #16772

Open pxLi opened 1 week ago

pxLi commented 1 week ago

Describe the bug https://github.com/NVIDIA/spark-rapids-jni build relies on the https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.10/thirdparty/cudf-pins/add_dependency_pins.cmake#L26-L27 which pins the generated cudf build versions (I think nvcomp project version in cmake file mismatch the pkg (4.0.1.0 vs 4.0.1) they provided on developer site) target/libcudf-install/lib64/cmake/nvcomp/nvcomp-config.cmake (found version "4.0.1.0")

    include(${rapids-cmake-dir}/cpm/package_override.cmake)
    rapids_cpm_package_override(${CMAKE_CURRENT_FUNCTION_LIST_DIR}/versions.json)

and caused that the pined version become 4.0.1.0 https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.10/thirdparty/cudf-pins/versions.json#L131

    include(${rapids-cmake-dir}/cpm/generate_pinned_versions.cmake)
    rapids_cpm_generate_pinned_versions(OUTPUT ${CMAKE_CURRENT_FUNCTION_LIST_DIR}/versions.json)

when update cudf submodule ref and pin their deps versions,

      {
        "11" : "11.x",
        "12" : "12.x"
      },
      "version" : "4.0.1.0"
    },
13:32:10  [INFO]      [exec] -- Downloading...
13:32:10  [INFO]      [exec]    dst='/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-867-cuda11/target/libcudf/cmake-build/_deps/nvcomp_proprietary_binary-subbuild/nvcomp_proprietary_binary-populate-prefix/src/nvcomp-linux-x86_64-4.0.1.0-cuda11.x.tar.gz'
13:32:10  [INFO]      [exec]    timeout='none'
13:32:10  [INFO]      [exec]    inactivity timeout='none'
13:32:10  [INFO]      [exec] -- Using src='https://developer.download.nvidia.com/compute/nvcomp/4.0.1.0/local_installers/nvcomp-linux-x86_64-4.0.1.0-cuda11.x.tar.gz'
13:32:10  [INFO]      [exec] -- [download 0% complete]
13:32:10  [INFO]      [exec] CMake Error at nvcomp_proprietary_binary-subbuild/nvcomp_proprietary_binary-populate-prefix/src/nvcomp_proprietary_binary-populate-stamp/download-nvcomp_proprietary_binary-populate.cmake:170 (message):
13:32:10  [INFO]      [exec]   Each download failed!
13:32:10  [INFO]      [exec] 
13:32:10  [INFO]      [exec]     error: downloading 'https://developer.download.nvidia.com/compute/nvcomp/4.0.1.0/local_installers/nvcomp-linux-x86_64-4.0.1.0-cuda11.x.tar.gz' failed

4.0.1 exists

Steps/Code to reproduce bug Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Expected behavior A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

Environment details Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context Add any other context about the problem here.

pxLi commented 1 week ago

the generated version should be available or the same as the one (4.0.1 with no trailing 0) at rapids-cmake. cc @vuule can you help take a look? thanks