RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
And then the following code mirroring ci/test_wheel_raft_dask.sh (code link), with a bit of extra debugging stuff added.
setup mimicking what happens in CI (click me)
Checked if there was extra `pip` configuration setup in the image.
```shell
pip config list
```
Just one, an index URL.
```text
# :env:.extra-index-url='https://pypi.anaconda.org/rapidsai-wheels-nightly/simple'
```
Checked the version of `pip`.
```shell
pip --version
# 23.0.1
```
Installed `pkginfo` to inspect the wheels.
```shell
pip install pkginfo
```
Downloaded wheels from the same CI run and put them in separate directories.
```shell
mkdir -p ./dist
RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
# git ref (entered in interactive prompt): 04186e4
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./local-pylibraft-dep
```
Inspected them to confirm that:
* both wheels' `name` fields have `-cu12` suffix
* `raft_dask` wheel depends on both `pylibraft-cu12` and `pylibraft`
They do.
```shell
# raft-dask
pkginfo \
--field=name \
--field=version \
--field=requires_dist \
./dist/raft_dask_cu12-*cp39*.whl
# name: raft-dask-cu12
# version: 24.8.0a20
# requires_dist: ['dask-cuda==24.8.*,>=0.0.0a0', 'distributed-ucxx-cu12==0.39.*', 'joblib>=0.11', 'numba>=0.57', 'numpy<2.0a0,>=1.23', 'pylibraft-cu12==24.8.*,>=0.0.0a0', 'pylibraft==24.8.*,>=0.0.0a0', 'rapids-dask-dependency==24.8.*,>=0.0.0a0', 'ucx-py-cu12==0.39.*', 'ucx-py==0.39.*', 'pytest-cov; extra == "test"', 'pytest==7.*; extra == "test"']
# pylibraft
pkginfo \
--field=name \
--field=version \
--field=requires_dist \
./local-pylibraft-dep/pylibraft_cu12-*cp39*.whl
# name: pylibraft-cu12
# version: 24.8.0a20
# requires_dist: ['cuda-python<13.0a0,>=12.0', 'numpy<2.0a0,>=1.23', 'rmm-cu12==24.8.*,>=0.0.0a0', 'cupy-cuda12x>=12.0.0; extra == "test"', 'pytest-cov; extra == "test"', 'pytest==7.*; extra == "test"', 'scikit-learn; extra == "test"', 'scipy; extra == "test"']
```
Installed the `pylibraft` wheel, just as the test script does.
```shell
python -m pip -v install --no-deps ./local-pylibraft-dep/pylibraft*.whl
```
That worked as expected.
```text
Processing /local-pylibraft-dep/pylibraft_cu12-24.8.0a20-cp39-cp39-manylinux_2_28_x86_64.whl
Installing collected packages: pylibraft-cu12
Successfully installed pylibraft-cu12-24.8.0a20
```
With that set up (a raft_dask-cu12 wheel in ./dist and pylibraft-cu12already installed), I ran the following:
HOWEVER... this alternative form fails in the expected way.
python -m pip -v install ./dist/*.whl
ERROR: Could not find a version that satisfies the requirement ucx-py==0.39.* (from raft-dask-cu12) (from versions: 0.0.1.post1)
ERROR: No matching distribution found for ucx-py==0.39.*
Expected behavior
I expected CI to fail because the constraints pylibraft==24.8.* and ucx-py==0.39.* are not satisfiable (those packages do not exist).
Environment details (please complete the following information):
nvidia-smi (click me)
```text
Fri May 31 12:06:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-32GB On | 00000000:06:00.0 Off | 0 |
| N/A 33C P0 55W / 300W | 341MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-32GB On | 00000000:07:00.0 Off | 0 |
| N/A 33C P0 42W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2-32GB On | 00000000:0A:00.0 Off | 0 |
| N/A 31C P0 42W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2-32GB On | 00000000:0B:00.0 Off | 0 |
| N/A 29C P0 41W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2-32GB On | 00000000:85:00.0 Off | 0 |
| N/A 31C P0 41W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2-32GB On | 00000000:86:00.0 Off | 0 |
| N/A 30C P0 42W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2-32GB On | 00000000:89:00.0 Off | 0 |
| N/A 34C P0 43W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2-32GB On | 00000000:8A:00.0 Off | 0 |
| N/A 30C P0 43W / 300W | 3MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
```
Additional context
The particular unsatisfiable dependency issue was likely introduced by recent changes adding rapids-build-backend (#2331, for https://github.com/rapidsai/build-planning/issues/31). But in theory this could just as easily happen with some other unrelated issue with dependencies, like a typo of the form joblibbbbb or something.
I am actively investigating this (along with @bdice and @nv-rliu). Just posting for documentation purposes.
Describe the bug
We recently observed a situation where
raft-dask
nightly wheels were being published with duplicated dependencies:pylibraft-cu12==24.8.*,>=0.0.0a0
ANDpylibraft==24.8.*,>=0.0.0a0
ucx-py-cu12==0.39.*
ANDucx-py==0.39.*
The unsuffixed ones are a mistake, fixed in #2347. However... that was only caught by
cugraph
's CI (build link).It should have been caught here in
raft
's CI, probably here:https://github.com/rapidsai/raft/blob/8ef71de26b01458f02f36ad96c1b3017cf985cc5/ci/test_wheel_raft_dask.sh#L14
Steps/Code to reproduce bug
Trying to reproduce a very recent CI build that passed despite using wheels that suffer from the issue fixed in #2437 (build link).
Ran a container mimicking what was used in that CI run.
And then the following code mirroring
ci/test_wheel_raft_dask.sh
(code link), with a bit of extra debugging stuff added.setup mimicking what happens in CI (click me)
Checked if there was extra `pip` configuration setup in the image. ```shell pip config list ``` Just one, an index URL. ```text # :env:.extra-index-url='https://pypi.anaconda.org/rapidsai-wheels-nightly/simple' ``` Checked the version of `pip`. ```shell pip --version # 23.0.1 ``` Installed `pkginfo` to inspect the wheels. ```shell pip install pkginfo ``` Downloaded wheels from the same CI run and put them in separate directories. ```shell mkdir -p ./dist RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})" # git ref (entered in interactive prompt): 04186e4 RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./local-pylibraft-dep ``` Inspected them to confirm that: * both wheels' `name` fields have `-cu12` suffix * `raft_dask` wheel depends on both `pylibraft-cu12` and `pylibraft` They do. ```shell # raft-dask pkginfo \ --field=name \ --field=version \ --field=requires_dist \ ./dist/raft_dask_cu12-*cp39*.whl # name: raft-dask-cu12 # version: 24.8.0a20 # requires_dist: ['dask-cuda==24.8.*,>=0.0.0a0', 'distributed-ucxx-cu12==0.39.*', 'joblib>=0.11', 'numba>=0.57', 'numpy<2.0a0,>=1.23', 'pylibraft-cu12==24.8.*,>=0.0.0a0', 'pylibraft==24.8.*,>=0.0.0a0', 'rapids-dask-dependency==24.8.*,>=0.0.0a0', 'ucx-py-cu12==0.39.*', 'ucx-py==0.39.*', 'pytest-cov; extra == "test"', 'pytest==7.*; extra == "test"'] # pylibraft pkginfo \ --field=name \ --field=version \ --field=requires_dist \ ./local-pylibraft-dep/pylibraft_cu12-*cp39*.whl # name: pylibraft-cu12 # version: 24.8.0a20 # requires_dist: ['cuda-python<13.0a0,>=12.0', 'numpy<2.0a0,>=1.23', 'rmm-cu12==24.8.*,>=0.0.0a0', 'cupy-cuda12x>=12.0.0; extra == "test"', 'pytest-cov; extra == "test"', 'pytest==7.*; extra == "test"', 'scikit-learn; extra == "test"', 'scipy; extra == "test"'] ``` Installed the `pylibraft` wheel, just as the test script does. ```shell python -m pip -v install --no-deps ./local-pylibraft-dep/pylibraft*.whl ``` That worked as expected. ```text Processing /local-pylibraft-dep/pylibraft_cu12-24.8.0a20-cp39-cp39-manylinux_2_28_x86_64.whl Installing collected packages: pylibraft-cu12 Successfully installed pylibraft-cu12-24.8.0a20 ```With that set up (a
raft_dask-cu12
wheel in./dist
andpylibraft-cu12
already installed), I ran the following:Just like we observed in CI:
pylibraft
(unsuffixed) is not mentioned in the logs, but all other dependencies areHOWEVER... this alternative form fails in the expected way.
Expected behavior
I expected CI to fail because the constraints
pylibraft==24.8.*
anducx-py==0.39.*
are not satisfiable (those packages do not exist).Environment details (please complete the following information):
nvidia-smi (click me)
```text Fri May 31 12:06:47 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-32GB On | 00000000:06:00.0 Off | 0 | | N/A 33C P0 55W / 300W | 341MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-32GB On | 00000000:07:00.0 Off | 0 | | N/A 33C P0 42W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-32GB On | 00000000:0A:00.0 Off | 0 | | N/A 31C P0 42W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-32GB On | 00000000:0B:00.0 Off | 0 | | N/A 29C P0 41W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2-32GB On | 00000000:85:00.0 Off | 0 | | N/A 31C P0 41W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2-32GB On | 00000000:86:00.0 Off | 0 | | N/A 30C P0 42W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2-32GB On | 00000000:89:00.0 Off | 0 | | N/A 34C P0 43W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2-32GB On | 00000000:8A:00.0 Off | 0 | | N/A 30C P0 43W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| +---------------------------------------------------------------------------------------+ ```Additional context
The particular unsatisfiable dependency issue was likely introduced by recent changes adding
rapids-build-backend
(#2331, for https://github.com/rapidsai/build-planning/issues/31). But in theory this could just as easily happen with some other unrelated issue with dependencies, like a typo of the formjoblibbbbb
or something.I am actively investigating this (along with @bdice and @nv-rliu). Just posting for documentation purposes.