weizhepei / InstructRAG

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
https://weizhepei.com/instruct-rag-page
MIT License
56 stars 5 forks source link

Request for Retrieval Code #3

Closed hzby closed 3 weeks ago

hzby commented 2 months ago

Thank you for your excellent work. It seems that the current repository does not contain the code to retrieve relevant documents using a query. Could you please provide this portion of the code to complete it?

wangpuzhou123 commented 1 month ago

感谢您,也想问下这个问题。

weizhepei commented 1 month ago

Thank you for your interest in our work. We have provided retrieved documents along with the queries for all datasets used in this work to facilitate easier reproduction. You can find them in our dataset folder.

To perform retrieval on your own corpus, the easiest way is to use Pyserini with prebuilt indexes. Below are some code snippets for sparse retrieval (e.g., BM25) and dense retrieval (e.g., DPR) for your reference.

Use Wikipedia dump as the retrieval source

searcher = LuceneSearcher.from_prebuilt_index('wikipedia-dpr')

Retrieve documents relevant to the given query

hits = searcher.search('who got the first nobel prize in physics')

Present retrieved document and relevance score

print(f'doc: {searcher.doc(hits[0].docid).raw()}\nscore: {hits[0].score}')


- Dense Retrieval

```python
# Dense Retriever (DPR)
from pyserini.search.faiss import FaissSearcher, DprQueryEncoder

# Load query encoder
encoder = DprQueryEncoder("facebook/dpr-question_encoder-single-nq-base")
# Use Wikipedia dump as the retrieval source
searcher = FaissSearcher.from_prebuilt_index('wikipedia-dpr-100w.dpr-single-nq', encoder)
# Retrieve documents relevant to the given query
hits = searcher.search('who got the first nobel prize in physics')
# Present retrieved document and relevance score
print(f'doc: {searcher.doc(hits[0].docid).raw()}\nscore: {hits[0].score}')

Please let me know if you have further questions!