rapidsai / ucx-wheels

BSD 3-Clause "New" or "Revised" License
1 stars 4 forks source link

support post-release versions, publish v1.15.0.post1 #5

Closed jameslamb closed 5 months ago

jameslamb commented 5 months ago

Contributes to https://github.com/rapidsai/build-planning/issues/57.

libucx.load_library() defined here tries to pre-load libcuda.so and libnvidia-ml.so, to raise an informative error (instead of a cryptic one from a linker) if someone attempts to use the libraries from this wheel on a system without a GPU.

Some of the projects using these wheels, like ucxx and ucx-py, are expected to be usable on systems without a GPU. See https://github.com/rapidsai/ucx-py/pull/1041#discussion_r1594729142.

To avoid those libraries needing to try-catch these errors, this proposes the following:

Notes for Reviewers

Proposing starting with 1.15.0.post1 right away, since that's the version that ucx-py will use. I'm proposing the following sequence of PRs here (assuming downstream testing goes well):

  1. this one
  2. another changing the version to 1.14.0.post1
  3. another changing the version to 1.16.0.post1
jameslamb commented 5 months ago

Once this merges, @pentschev or @jameslamb could you test installing a ucxx wheel along with this specific version of libucx (the ucxx wheels will allow using 1.15.0 at runtime right now I believe) to ensure that things work the way we want on both CPU-only and GPU-enabled machines?

Yes I can do this.

jameslamb commented 5 months ago

CI is stuck waiting for arm64 runners. Once those run and (hopefully) pass, I'll merge this and test with ucxx + these new wheels.

jameslamb commented 5 months ago

@vyasr @pentschev the CI pipeline from this PR has been stuck waiting for a runner for 2+ hours (build link), so the new wheels aren't up on the nightly index yet (https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libucx-cu12/).

To get around that, I did a minimal test of ucxx by downloading the new wheels from S3 + running https://github.com/rapidsai/ucxx/blob/branch-0.38/python/examples/basic.py.

#!/bin/bash

set -e -u -o pipefail

# check if there's a GPU attached
nvidia-smi || true

# download ucx wheel with the fixes
rm -f ./ucx-wheels-cu12.tar.gz
wget \
    -O ucx-wheels-cu12.tar.gz \
    https://downloads.rapids.ai/ci/ucx-wheels/branch/main/a0bf56f/ucx-wheels_wheel_cpp_ucx_cu12_x86_64.tar.gz

mkdir -p /tmp/delete-me/ucx-wheels/
tar \
    -xvzf ./ucx-wheels-cu12.tar.gz \
    -C /tmp/delete-me/ucx-wheels/

# install it
pip install /tmp/delete-me/ucx-wheels/*.whl

# Install the latest `ucxx` wheel.
pip install 'ucxx-cu12==0.38.*,>=0.0.0a0'

# try importing ucxx and libucx
python -c "import ucxx._lib.libucxx as ucx_api"
python -c "import libucx; libucx.load_library()"

# try running the example
python ./python/examples/basic.py

Ran it with and without a GPU visible to the processes.

# with GPU
docker run \
    --rm \
    --gpus 1 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
    bash ./test.sh

# no GPU
docker run \
    --rm \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
    bash ./test.sh

Saw it succeed (after applying the modifications from https://github.com/rapidsai/ucxx/pull/229) on both.

I think that's enough evidence to move forward with publishing the other versions (1.14.0.post1 and 1.16.0.post1). But to be sure, tomorrow I'll try with the 1.15.0.post1 wheels in CI for https://github.com/rapidsai/ucx-py/pull/1041.

jameslamb commented 5 months ago

these wheels are working with ucx-py builds in both GPU and non-GPU environments 🎉

https://github.com/rapidsai/ucx-py/pull/1041

vyasr commented 5 months ago

Awesome!