rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 4 forks source link

Only build conda packages once for libraries without direct CUDA dependency #67

Open jameslamb opened 6 months ago

jameslamb commented 6 months ago

Description

Created from @bdice's suggestion at https://github.com/rapidsai/cudf/pull/15245#discussion_r1617517646.

Some RAPIDS libraries do not have a direct CUDA dependency, but we're still doing multiple conda builds (one per CUDA major version) for them.

Those projects' conda packages should only be built once, and be free from unnecessarily-declared CUDA dependencies.

See "Notes" for a concrete example.

Benefits of this work

Acceptance Criteria

For every RAPIDS library that doesn't have a CUDA dependency, the following should be true for their conda packages:

Approach

Look through the RAPIDS projects for libraries meeting these criteria:

Add them to a task list here.

For each of those, as described in https://github.com/rapidsai/cudf/pull/15245#discussion_r1617517646, modify them as follows:

Notes

Related to #43, which describes changing the workflows for pure-Python packages to only build against one Python version.

Example: custreamz

For example, let's consider custreamz.

Look at these packages on the rapidsai-nightly channel: https://anaconda.org/rapidsai-nightly/custreamz/files?version=24.08.00a66.

image

Those cuda11_* and cuda12_* packages differ only by their dependency on cuda-version

But custreamz shouldn't need a cuda-version dependency... it only contains Python code and only interacts with cudf and cudf_kafka via their Python APIs.

The set of changes to address this issue for custreamz should be, roughly:

What about wheels?

This issue is about conda packages only.

Wheels have a different set of concerns, namely that they use suffixed package names to convey CUDA major version support, and that suffix affects everything in the dependency tree.

For example, dask-cudf depends on cudf, and so we publish wheels with names like dask-cudf-cu11 (depending on cudf-cu12) and dask-cudf-cu12 (depending on dask-cudf-cu12).

### Tasks
- [ ] custreamz
- [ ] dask-cudf
- [ ] dask-cuda
bdice commented 6 months ago

There are some things we'd need to improve in how we handle CI artifacts for packages without a direct CUDA dependency, so that they can be tested on CUDA runners with either CUDA 11 or CUDA 12. cuGraph's CI scripts describe this:

https://github.com/rapidsai/cugraph/blob/9503f31add68ab0bda3982fa069da8f756a187a5/ci/build_python.sh#L37-L40

vyasr commented 6 months ago

I expect the set of packages to be changed here to be a strict subset of those in #43, with the exception being packages like dask-cudf that have a transitive CUDA dependency via one of their dependencies (because as you noted, wheels need to maintain that dependency so that the dependency tree is CUDA aware even if the current package is not).