GPU crashes when running "D_packed @ Q.to(dtype=D_packed.dtype).T" with no error message

stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

MIT License

3.06k stars 388 forks source link

After diving deep in the code, look like this line https://github.com/stanford-futuredata/ColBERT/blob/862edcf5ec35fd377ecb8575d753bbefdda463e6/colbert/indexing/codecs/decompress_residuals.cu#L42-L50

restrict cpp method decompress_residuals_cuda on GPU device 0 only. decompress_residuals_cuda will crash when running on other GPUs.

After update it to .device(torch::kCUDA, residuals.device().index()). The crash problem is resolved.

Should we update to .device(torch::kCUDA, residuals.device().index())? This should also significantly increase the model inferencing efficiency by enabling model inference on multiple GPUs.

Wondering if this is a bug or designed intentionally.

stanford-futuredata / ColBERT

GPU crashes when running "D_packed @ Q.to(dtype=D_packed.dtype).T" with no error message #348