rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
https://docs.rapids.ai/api/raft/stable/
Apache License 2.0
683 stars 180 forks source link

[BUG] IVF-PQ index creation crashes on aarch64 for wiki_all_1M benchmark #2324

Open mfoerste4 opened 1 month ago

mfoerste4 commented 1 month ago

Describe the bug IVF-PQ build of wiki_all_1M fails on Grace/H200 with

`CUDA Exception: Warp Illegal Address

Thread 1 "RAFT_IVF_PQ_ANN" received signal CUDA_EXCEPTION_14, Warp Illegal Address. [Switching focus to CUDA kernel 0, grid 39271, block (821,0,0), thread (160,0,0), device 0, sm 0, warp 15, lane 0] 0x00004002dba2b8b0 in raft::neighbors::ivf_pq::detail::process_and_fill_codes_kernel<256u, 8u, long><<<(8192,1,1),(256,1,1)>>> () at /home/scratch.mfoerster_gpu/raft_ws/raft/cpp/include/raft/neighbors/detail/ivf_pq_build.cuh:1164 in _ZN4raft9neighbors6ivf_pq6detail14encode_vectorsILj32ElEclElj inlined from ivf_pq_codepacking.cuh:166 1164 auto t = in_vectors(i, j, k) - pq_centers(partition_ix, k, l); `

Steps/Code to reproduce bug The IVF-PQ index build fails both standalone and within cagra.

RAFT_IVF_PQ_ANN_BENCH --build --force --data_prefix=<datasets> --benchmark_filter=raft_ivf_pq.d64-nlist16K wiki_all_1M.json

Expected behavior Finish benchmark without crash

Environment details:

mfoerste4 commented 1 month ago

I just checked that this does not repro on other ARM CPUs (altra system).