rapidsai / build-planning

Tracking for RAPIDS-wide build tasks
https://github.com/rapidsai
0 stars 4 forks source link

Ensure cached packages installed in CI test phase #14

Open jakirkham opened 10 months ago

jakirkham commented 10 months ago

Recently we ran into an issue on a project (cuCIM) where older packages of the project (libcucim & cucim from 23.12) were installed instead of the most recent packages from the PR (24.02.00a*). This made the installation look successful. However old issues that had been fixed in the development branch (branch-24.02) were not getting picked up

This was ultimately caused by a solver issue. However we were not able to ascertain that until we pinned the packages installed in the test phase to the latest version. Then the solver issue became clear and we could work to resolve that

Think we should take a closer look at this issue and come up with a way to guarantee the cached packages are picked up as opposed to some other packages. Attempted to do this more directly by using the <channel>::<package> syntax, but this didn't work well with file based channels. Maybe there is a better way to do this

jameslamb commented 1 month ago

@jakirkham I'm a bit confused by what specifically "cached" means in your description, but I came here to report an issue that felt like the conda equivalent of #69 because that's how @vyasr described this issue in https://github.com/rapidsai/build-planning/issues/69#issuecomment-2146202627.

In cugraph 24.10, near the end of burndown, we observed a situation where 24.12 packages were getting pulled in on CI jobs targeting branch-24.10 (https://github.com/rapidsai/cugraph/pull/4690).

It seems like the root cause was this pattern, that's used across RAPIDS CI scripts:

# create an environment
rapids-dependency-file-generator \
  --output conda \
  --file-key test_notebooks \
  --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION}" | tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test

# download packages built in earlier CI jobs
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)

# install those packages
rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  --channel "${PYTHON_CHANNEL}" \
  libcugraph pylibcugraph cugraph

Adding those locally-downloaded files with --channel doesn't force conda install to use the packages... it's free to choose other packages to satisfy constraints like "cugraph". Including those from other RAPIDS releases!

It would be safer for CI purposes to do something like this for that last install:

RAPIDS_VERSION="$(rapids-version-major-minor)"

rapids-mamba-retry install \
  --channel "${CPP_CHANNEL}" \
  --channel "${PYTHON_CHANNEL}" \
  "libcugraph=${RAPIDS_VERSION}.*" \
  "pylibcugraph=${RAPIDS_VERSION}.*" \
  "cugraph=${RAPIDS_VERSION}.*"

That would prevent the solver from ever falling back to older releases (e.g. using 24.08 packages on the 24.10 branch) picking up newer releases (e.g. using 24.12 packages on the 24.10 branch).

I think it's worth making changes like that across RAPIDS in 24.12, what do you think?

cc @bdice @msarahan @KyleFromNVIDIA

jakirkham commented 1 month ago

Meaning the packages from S3

TBH this is something that would be solved by strict channel priority

jameslamb commented 1 month ago

Got it, ok.

I don't think the particular issue I just noted about would be solved by strict channel priority. It's a case where conda chose 24.12.* packages from the rapidsai-nightly channel when we wanted to use 24.10.* packages from the rapidsai-nightly channel.

jakirkham commented 1 month ago

Hmm...am not following

With your use case, it sounds like cuGraph packages should be in ${CPP_CHANNEL} and ${PYTHON_CHANNEL} correct?

If we added those channels higher in the channel order than rapidsai-nightly and enabled strict channel priority, it would only grab the cuGraph packages from ${CPP_CHANNEL} and ${PYTHON_CHANNEL} and ignore rapidsai-nightly. This is what we want, right?

What am I missing?

jameslamb commented 1 month ago

🙈 you're absolutely right John, the 24.10.* are coming from those local channels, not rapidsai-nightly, and strict channel priority would solve this. Sorry, I was not thinking about this the right way. Thanks for correcting me.

We might still want to consider something like the script changes I mentioned above, since they're a relatively small amount of effort compared to enforcing strict channel priority.

jakirkham commented 1 month ago

All good. Just wanted to make sure I wasn't missing something

Agree that would be a good improvement. We did the same thing in cuCIM: https://github.com/rapidsai/cucim/pull/680

Another improvement that has been helpful has been this suggestion by Charles ( https://github.com/rapidsai/build-planning/issues/22 ). It is good both for overall runtime and reliability

vyasr commented 1 month ago

The reason that I likened this issue to #69 is that fundamentally --find-links is analogous to adding a conda channel because it gives package manager an extra place to pull dependencies from. The main difference between pip and conda in this regard is that pip has no concept of channel priority, so with pip you had to introduce constraints in order to force installation of the desired file. With conda, as John pointed out the channel priority should solve the problem. In the interim, though, the suggestion that you proposed would work and would be the closest direct port possible of the pip constraints approach that you've added.

jameslamb commented 1 month ago

Yep yep makes sense, thanks for saying you also support the idea. I'm gonna plan on doing this (adding the constraints in scripts) next week, on 24.12. Seems like a few hours of effort, well worth it in my opinion to minimize pain like what we're experiencing in cugraph for 24.10 and trying to fix in https://github.com/rapidsai/cugraph/pull/4690

jameslamb commented 1 month ago

I created a sub-issue to track the work of updating those scripts: https://github.com/rapidsai/build-planning/issues/106.