Closed jameslamb closed 1 month ago
Contributes to https://github.com/rapidsai/build-planning/issues/102
Fixes #217
Temporarily added a CUDA 11.4.3 test job to CI here (the same specs as the failing nightly), by pointing at the branch from https://github.com/rapidsai/shared-workflows/pull/246.
Observed the exact same failures with CUDA 11.4 reported in https://github.com/rapidsai/build-planning/issues/102.
... + nccl 2.10.3.1 hcad2f07_0 rapidsai-nightly 125MB ... ./WHOLEGRAPH_CSR_WEIGHTED_SAMPLE_WITHOUT_REPLACEMENT_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libwholegraph/../../../lib/libwholegraph.so: undefined symbol: ncclCommSplit sh -c exec "$0" ./WHOLEMEMORY_HANDLE_TEST ./WHOLEMEMORY_HANDLE_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libwholegraph/../../../lib/libwholegraph.so: undefined symbol: ncclCommSplit sh -c exec "$0" ./GRAPH_APPEND_UNIQUE_TEST
(build link)
Pushed a commit adding a floor of nccl>=2.18.1.1. Saw all tests pass with CUDA 11.4 😁
nccl>=2.18.1.1
... + nccl 2.22.3.1 hee583db_1 conda-forge 131MB ... (various log messages showing all tests passed)
Thanks!
@linhu-nv could you please review here?
/merge
Contributes to https://github.com/rapidsai/build-planning/issues/102
Fixes #217
Notes for Reviewers
How I tested this
Temporarily added a CUDA 11.4.3 test job to CI here (the same specs as the failing nightly), by pointing at the branch from https://github.com/rapidsai/shared-workflows/pull/246.
Observed the exact same failures with CUDA 11.4 reported in https://github.com/rapidsai/build-planning/issues/102.
(build link)
Pushed a commit adding a floor of
nccl>=2.18.1.1
. Saw all tests pass with CUDA 11.4 😁(build link)