rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 4 forks source link

Pin RAPIDS nightly version in conda installs #106

Closed jameslamb closed 1 week ago

jameslamb commented 1 month ago

Description

See https://github.com/rapidsai/build-planning/issues/14#issuecomment-2391613094 in the parent issue for context.

In short, for any CI scripts doing this (pseudocode):

conda env create -n test --file env.yaml

LOCAL_CHANNEL=$(download-ci-artifacts)

conda install \
    -c "${LOCAL_CHANNEL}" \
    somepackage

That conda install should instead pin to a specific version (e.g. 24.10), like this:

conda install \
    -c "${LOCAL_CHANNEL}" \
    "somepackage=${RAPIDS_VERSION}"

To reduce the risk of issues like those we saw in cugraph near the end of the 24.10 release cycle, where 24.10 builds were silently getting 24.08 or 24.12 nightlies (https://github.com/rapidsai/cugraph/pull/4690).

Benefits of this work

Improves release confidence in RAPIDS libraries.

Reduces the risk of packages from a different RAPIDS release being pulled into CI, which can lead to incorrect packages and docs, silently-unnoticed compatibility issues, and wasted developer time and energy investigating hard-to-understand bugs or CI failures. (related: #22)

Acceptance Criteria

Approach

Make changes similar to those seen in https://github.com/rapidsai/cugraph/pull/4690 across RAPIDS.

Notes

These updates can be done in any order, since they're self-contained to individual repos.

### Updates
- [x] cucim (https://github.com/rapidsai/cucim/pull/791)
- [x] cudf (https://github.com/rapidsai/cudf/pull/17013, https://github.com/rapidsai/cudf/pull/17042)
- [x] cugraph (https://github.com/rapidsai/cugraph/pull/4690)
- [ ] cugraph-docs (https://github.com/rapidsai/cugraph-docs/pull/46 (blocked by https://github.com/rapidsai/cugraph/pull/4662)
- [x] cugraph-gnn (https://github.com/rapidsai/cugraph-gnn/pull/58, https://github.com/rapidsai/cugraph-gnn/pull/59)
- [x] cugraph-ops (https://github.com/rapidsai/cugraph-ops/pull/699)
- [x] cuml (https://github.com/rapidsai/cuml/pull/6103, https://github.com/rapidsai/cuml/pull/6104)
- [x] cuopt (https://github.com/rapidsai/cuopt/pull/2038)
- [x] cuspatial (https://github.com/rapidsai/cuspatial/pull/1469)
- [x] cuvs (https://github.com/rapidsai/cuvs/pull/406)
- [x] cuxfilter (https://github.com/rapidsai/cuxfilter/pull/639)
- [x] dask-cuda (https://github.com/rapidsai/dask-cuda/pull/1395)
- [x] kvikio (https://github.com/rapidsai/kvikio/pull/495)
- [x] private repos (none that need updates)
- [x] pynvjitlink (https://github.com/rapidsai/pynvjitlink/pull/106)
- [x] raft (https://github.com/rapidsai/raft/pull/2467)
- [x] rmm (https://github.com/rapidsai/rmm/pull/1696, https://github.com/rapidsai/rmm/pull/1703)
- [x] ucx-py (https://github.com/rapidsai/ucx-py/pull/1082)
- [x] ucxx (https://github.com/rapidsai/ucxx/pull/298)
- [x] wholegraph (https://github.com/rapidsai/wholegraph/pull/228)
jameslamb commented 1 month ago

I started a small batch of these:

Will see how reviews go on those before continuing with the rest of the projects.

jameslamb commented 1 month ago

Going through these, I found this in cucim:

RAPIDS_VERSION_NUMBER=$(rapids-generate-version)

# ... omitted ...

rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  --channel "${PYTHON_CHANNEL}" \
  "libcucim=${RAPIDS_VERSION_NUMBER}" \
  "cucim=${RAPIDS_VERSION_NUMBER}"

(build link)

I should have thought of that before! That's even better than using rapids-version-major-minor like I had been. It'll ensure CI looks for something like libcucim=24.12.00a31 or similar... which would prevent cases like "silently fell back to an earlier 24.12.* nightly", like we saw in the beginning of the Python 3.12 migration in the 24.10 release: https://github.com/rapidsai/integration/pull/719#discussion_r1761816933

jameslamb commented 1 month ago

Thinking more about https://github.com/rapidsai/build-planning/issues/106#issuecomment-2402728240... I don't think rapids-generate-version is safe to use like that.

Because it will generate a version like:

cudf=24.12.00a17

Where the 17 means "17 commits from the latest tag to this PR". Running that once at build time and then again, later at test time might result in 2 different values if more commits are merged in the time between those runs (totally possible in a highly-active repo like cudf).

I'm going to use rapids-version to get {major}.{minor}.{patch} and finish this up. It at least is a cheap way to avoid the worst case of this (e.g. using 24.08 packages when you wanted 24.12). Strict channel priority (#84) and combining the installs (#22) in the future will get us stronger guarantees.

jameslamb commented 1 week ago

The only remaining item here was cugrpah-docs, but development on that has been stalled for months:

And isn't planned to be completed in 24.12. There's already that open PR (https://github.com/rapidsai/cugraph-docs/pull/46) tracking this CI-script fiddling for that repo, I don't think it's worth keeping this issue open on this board just for that currently-inactive repo.

I'm closing this, so we can focus on the other remaining 24.12 things.