stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.95k stars 377 forks source link

Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel #290

Closed Anmol6 closed 8 months ago

bclavie commented 8 months ago

I think after the latest batch of fixes it looks good now! Thank you so much for doing this!

fblissjr commented 8 months ago

This fixed a problem I was having downstream in RAGatouille, always running in distributed mode on wsl2 with a single gpu. Thank you for the PR!

https://github.com/bclavie/RAGatouille/issues/30#issuecomment-1889669431

edit: looks like the trainer is still always forcing distributed torch. the collection indexer fixed indexing, though. Definitely the right direction though.