Closed jameslamb closed 6 months ago
Once this merges, @pentschev or @jameslamb could you test installing a ucxx wheel along with this specific version of libucx (the ucxx wheels will allow using 1.15.0 at runtime right now I believe) to ensure that things work the way we want on both CPU-only and GPU-enabled machines?
Yes I can do this.
CI is stuck waiting for arm64 runners. Once those run and (hopefully) pass, I'll merge this and test with ucxx
+ these new wheels.
@vyasr @pentschev the CI pipeline from this PR has been stuck waiting for a runner for 2+ hours (build link), so the new wheels aren't up on the nightly index yet (https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libucx-cu12/).
To get around that, I did a minimal test of ucxx
by downloading the new wheels from S3 + running https://github.com/rapidsai/ucxx/blob/branch-0.38/python/examples/basic.py.
#!/bin/bash
set -e -u -o pipefail
# check if there's a GPU attached
nvidia-smi || true
# download ucx wheel with the fixes
rm -f ./ucx-wheels-cu12.tar.gz
wget \
-O ucx-wheels-cu12.tar.gz \
https://downloads.rapids.ai/ci/ucx-wheels/branch/main/a0bf56f/ucx-wheels_wheel_cpp_ucx_cu12_x86_64.tar.gz
mkdir -p /tmp/delete-me/ucx-wheels/
tar \
-xvzf ./ucx-wheels-cu12.tar.gz \
-C /tmp/delete-me/ucx-wheels/
# install it
pip install /tmp/delete-me/ucx-wheels/*.whl
# Install the latest `ucxx` wheel.
pip install 'ucxx-cu12==0.38.*,>=0.0.0a0'
# try importing ucxx and libucx
python -c "import ucxx._lib.libucxx as ucx_api"
python -c "import libucx; libucx.load_library()"
# try running the example
python ./python/examples/basic.py
Ran it with and without a GPU visible to the processes.
# with GPU
docker run \
--rm \
--gpus 1 \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
bash ./test.sh
# no GPU
docker run \
--rm \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
bash ./test.sh
Saw it succeed (after applying the modifications from https://github.com/rapidsai/ucxx/pull/229) on both.
I think that's enough evidence to move forward with publishing the other versions (1.14.0.post1 and 1.16.0.post1). But to be sure, tomorrow I'll try with the 1.15.0.post1 wheels in CI for https://github.com/rapidsai/ucx-py/pull/1041.
these wheels are working with ucx-py
builds in both GPU and non-GPU environments 🎉
Awesome!
Contributes to https://github.com/rapidsai/build-planning/issues/57.
libucx.load_library()
defined here tries to pre-loadlibcuda.so
andlibnvidia-ml.so
, to raise an informative error (instead of a cryptic one from a linker) if someone attempts to use the libraries from this wheel on a system without a GPU.Some of the projects using these wheels, like
ucxx
anducx-py
, are expected to be usable on systems without a GPU. See https://github.com/rapidsai/ucx-py/pull/1041#discussion_r1594729142.To avoid those libraries needing to try-catch these errors, this proposes the following:
v1.15.0.post1
Notes for Reviewers
Proposing starting with
1.15.0.post1
right away, since that's the version thatucx-py
will use. I'm proposing the following sequence of PRs here (assuming downstream testing goes well):1.14.0.post1
1.16.0.post1