stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.67k stars 355 forks source link

Focusing retrieval on list of document ids with doc_ids parameter doesn't work #323

Open MartinV279 opened 3 months ago

MartinV279 commented 3 months ago

I have been trying to use something similar to metadata filtering with Colbert, and from the function description of RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0").search() I've seen that we can focus the retrieval search on a list of documents stated in docs_id.

The code I am using is the following (for testing purposes):

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
index_path = RAG.index(index_name=index_name, collection=collection, document_ids = document_id)

p = RAG.encode(document_id[:20])
out_list = RAG.search("How do I change a password?", doc_ids = p)

However, I still get documents in the results (out_list) that are not in the list provided in variable p. Is there another way of doing this correctly, or is the feature not fully implemented yet? I haven't been able to find some examples or better documentation on this. Thanks in advance!