Closed vyasr closed 4 months ago
https://github.com/conda-forge/cuda-feedstock/issues/13 is tracking the rollout of CUDA 12.2 to conda-forge.
We will need to resolve a blocking issue, since RAPIDS requires gcc 11 and that conflicts with cuda-nvcc
which requires gcc 12. I have filed a PR with a fix.
Both of those issues are resolved
Should add the gcc
constraint only affected cuda-nvcc
(usually used in dev environments)
cuda-nvcc_{{ target_platform }}
is what is used by {{ compiler("cuda") }}
and did not have this issue
We merged pynvjitlink's Conda package builds yesterday ( https://github.com/rapidsai/pynvjitlink/pull/33 )
Have started a PR to release pynvjitlink 0.1.7 ( https://github.com/rapidsai/pynvjitlink/pull/42 ), which will be needed to build and upload the Conda packages
Also James has submitted PRs adding Conda & Wheels to RAPIDS projects
A full listing of the PRs with current status is in this comment ( https://github.com/rapidsai/build-planning/issues/7#issuecomment-1887577440 )
We will need to modify conda recipes to enforce CUDA Minor Version Compatibility (MVC). This stems from a discussion I started here: https://github.com/rapidsai/raft/pull/2092#issuecomment-1900991184
There are three (two?) basic issues.
compiler('cuda')
rmm example commit: https://github.com/rapidsai/rmm/pull/1419/commits/ff8ea2d4672069e2a6087ed45c59807caf3ed0b4
The compiler('cuda')
has a strong run-export of cuda-version
. We discussed this and decided this is a good and intentional behavior for the cuda-nvcc
compiler package, because not all CUDA software obeys the rules for Minor Version Compatibility. However, RAPIDS does. We need to ignore this run-export, because it will prevent packages built with CUDA 12.2 from being installed with cuda-version=12.0
or similar.
Concretely, this means updating the existing sections that are ignoring run exports from the CUDA 11 compiler package, and adding the CUDA 12 compiler package as shown in the rmm example.
rmm example commit: https://github.com/rapidsai/rmm/pull/1419/commits/135c25916dd2b56b48c159375aada38f8a099ace
Initially, when we created CUDA 12 conda packages for RAPIDS, we relied on putting -dev
packages in the host
environment so that they would create run_exports
and add the non-dev package to the run
dependencies. For example, we added cuda-cudart-dev
to rmm's host
section, and it created a run dependency on cuda-cudart
. This was fine for CUDA 12.0 but this strategy is not compatible with CUDA Minor Version Compatibility. If we build with CUDA 12.2, the cuda-cudart-dev
package will export the subpackage cuda-cudart
from the same recipe with a max pin. This means that recipes built with CUDA 12.2's cuda-cudart-dev
cannot be installed with cuda-version=12.0
.
This rule applies to all -dev
libraries, including cuda-cudart
and math libraries. For some of these libraries, we do not need them in host
to build the RAPIDS package. That is true for rmm's usage of cuda-cudart-dev
in host
(the compiler itself also adds cuda-cudart-dev
to the build
dependencies, but the runtime library is not a strong run-export so it doesn't cross from build
to run
). For other cases, like libcuml's usage of math libraries, we will need to add ignore_run_exports_from
and list those dev libraries in libcuml's recipe, to ensure MVC works as intended.
cuda-version==${RAPIDS_CUDA_VERSION%.*}
in all conda commands?This one is more questionable, and may require no action. We saw a problem in cucim's CI where cupy 13 caused cucim to need an explicit cuda-version
specification while installing cucim/libcucim. My hope was that adding cuda-version==${RAPIDS_CUDA_VERSION%.*}
to the installation commands in CI would cause an error during the solve if either of the problems described above in (1) and (2) were encountered. I tried this here while playing with rmm, and found that pinning cuda-version
didn't actually cause a solve error -- it just forced fallback to the latest nvidia
channel CUDA packages (CI logs). This was attempted before I applied the changes for (1) and (2), so I would want this to fail in CI. It succeeded but with the wrong channels/packages.
Package Version Build Channel Size
───────────────────────────────────────────────────────────────────────────────────────
Install:
───────────────────────────────────────────────────────────────────────────────────────
+ cuda-cudart 12.3.101 0 nvidia 214kB
+ gtest 1.14.0 h2a328a1_1 conda-forge 394kB
+ fmt 10.2.1 h2a328a1_0 conda-forge 190kB
+ gmock 1.14.0 h8af1aa0_1 conda-forge 7kB
+ spdlog 1.12.0 h6b8df57_2 conda-forge 183kB
+ librmm 24.04.00a20 cuda12_240130_gd4f8aa23_20 /tmp/cpp_channel 2MB
+ librmm-tests 24.04.00a20 cuda12_240130_gd4f8aa23_20 /tmp/cpp_channel 4MB
This isn't desirable, so I don't think pinning cuda-version
will do anything to help us enforce that CI test jobs are using the versions we intend. Therefore, I propose that we act on items (1) and (2), and skip (3) for now.
I think this can be closed, since conda support is complete. Final tasks are being tracked here: https://github.com/rapidsai/build-planning/issues/7#issuecomment-1957227696
We would like to start publishing conda packages that support versions of CUDA newer than CUDA 12.0. At the moment, this is blocked on efforts to get the CTK on conda-forge updated to a sufficiently new version. As of this writing, we are currently updating the conda-forge CTK to 12.1.1. Our plan is to continue the cf update process, and whatever the latest version of the CTK is that's available via cf on Jan 8, 2024, we will use that version for building RAPIDS 24.02 packages.
Assuming that #7 is completed before this, the main tasks will be to:
Step 2 above will likely involve making updates to dependency files in various RAPIDS repos.
This issue will be filled out more and updated once the conda-forge updates are completed and the version finalized.