rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
https://docs.rapids.ai/api/raft/stable/
Apache License 2.0
680 stars 180 forks source link

[DO NOT MERGE] Test CCCL version bump #2358

Closed sleeepyjack closed 2 weeks ago

sleeepyjack commented 3 weeks ago

This PR tests an upcoming CCCL version bump (https://github.com/rapidsai/rapids-cmake/pull/631) and should not be merged.

sleeepyjack commented 3 weeks ago

One of the tests is running into an error: https://github.com/rapidsai/raft/actions/runs/9475150045/job/26109283214?pr=2358#step:7:824

We ran into a similar issue yesterday with cuco when bumping the CCCL version to 2.5.0 (see NVIDIA/cuCollections#504). Turns out the culprit was a sticky CUDA error that resurfaced during a downstream Thrust/cub call.

I'm not very familiar with this codebase. Could someone help investigate this issue?

bdice commented 3 weeks ago

I can reproduce the issue locally in a devcontainer by running build-raft-cpp && ~/raft/cpp/build/latest/gtests/CORE_TEST -V. I won't have much time to look deeper today but at least we have a reproducer outside of CI.

sleeepyjack commented 2 weeks ago

Closing this PR since all checks have passed