stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.89k stars 374 forks source link

can't load full index into memory #42

Closed JamesDeAntonis closed 3 years ago

JamesDeAntonis commented 3 years ago

What's the easiest way to use ColBERT without loading the full index into memory? We are building an index off of the wiki_dpr dataset (and eventually more), so we have about 21 million passages and counting. The full index is about 630Gb on disk and we have 230Gb of memory to work with (hopefully not needing nearly the full 230). I understand that faiss allows for this type of search (only metadata gets loaded into memory and the actual vectors stay on disk), so curious whether you support this in the current repo.

When running the retrieval script in the README, I run into memory issues once I start building IndexPart here. Should I be doing something with the index_part param? Any insight would be greatly appreciated.

Thanks!

(UPDATE: fyi, the retrieve script runs properly on a tiny dev subset of the dataset)

okhat commented 3 years ago

You just need to use batch-mode retrieval and ranking!

Just keep in mind it's two steps, not one. There are some instructions in the README. Let me know if you face issues using them.

Batch retrieval loads only the compress FAISS index and retrieves the initial (unsorted) set of passages. Batch re-ranking streams over the index one part at a time, so it uses a tiny fraction of memory at any point.

JamesDeAntonis commented 3 years ago

Very cool!

By two-step, you're referring to how the second (re-ranking) step in end-to-end isn't implemented yet? As suggested here

okhat commented 3 years ago

The second step is implemented. You just need to use a different script colbert.retrieve then colbert.rerank (give it the output topk).

What isn't implemented is two steps from one script, which would be nice to have eventually. But this shouldn't affect your goals above!

JamesDeAntonis commented 3 years ago

Yeah, to clarify I meant that we can't fully do end-to-end in one shot, but currently we instead have to call retrieve and then rerank (I think that's what you said)

okhat commented 3 years ago

Precisely! Give it a run. It should be really fast and smooth, I hope :D

JamesDeAntonis commented 3 years ago

This seems to be working properly!

I am also having some pains due to trying to use huggingface's model. I noticed that in the paper it is said that the used output dimension is 128, and that is the default in this repo, but the HF pretrained model uses 768. I plan to use 128 because I don't have space for 768, so I'll probably nix huggingface entirely, outside of how it's used in this repo.

Do you have the dim 128 model saved anywhere, as used in the paper?

okhat commented 3 years ago

Not sure if we corresponded about this by email, but as I mentioned to some other folks, I'm happy to share a checkpoint with you if you reach out by email!