Onboarding IR Benchmarking

manisnesan commented 1 year ago

Use a BEIR dataset (smallest ones) to set up an end to end evaluation.

Tools

primeqa

Experiments

sparse-bm25 retrieval (https://github.com/primeqa/primeqa/tree/main/notebooks/ir/sparse)
DPR (Dense Passage Retriever) - Biencoders - Vector Search
Colbert (Reranker and can be used for retriever)

Books

AI Powered Search - See the last chapter

manisnesan commented 1 year ago

Outline

Setup: Install beir, pyserini, rich, pytrec-eval (directly from github)
Download the scifact dataset https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/scifact.zip
Unzip the dataset
Load corpus, queries, qrels using beir GenericDataLoader. Explore the corpus, queries, qrels
We are going to use pyserini for getting a baseline using Lucene. So convert BEIR corpus to Pyserini Format {id: str, contents: str, title: str} and id is the unique identifier for each document and is a required field.

Index the corpus in a Lucene index to get BM25 baseline using direct java implementation in pyserini. (note: input should be a directory). Check the usage https://github.com/castorini/pyserini/blob/master/docs/usage-index.md#building-a-bm25-index-direct-java-implementation

python -m pyserini.index.lucene \
    -collection JsonCollection \
    -input {save_dir}/scifact/corpus \
    -index {save_dir}/indexes/scifact_corpus_jsonl \
    -fields title contents \
    -generator DefaultLuceneDocumentGenerator \
    -threads 8 \
    -storePositions -storeDocvectors -storeRaw

Confirm pyserini is able to Index the documents.
Retrieve the matching results for all the queries. BEIR also provides a batch retrieval option.
Use beir EvaluateRetrieval which uses pytrec-eval module to calculate the standard retrieval measures such as NDCG, MAP, R@k, P@k

Reference

https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_anserini_bm25.py

manisnesan commented 1 year ago

https://www.elastic.co/blog/improving-information-retrieval-elastic-stack-benchmarking-passage-retrieval

manisnesan commented 1 year ago

({'NDCG@1': 0.54, 'NDCG@3': 0.61219, 'NDCG@5': 0.64201, 'NDCG@10': 0.6647},
 {'MAP@1': 0.51928, 'MAP@3': 0.58719, 'MAP@5': 0.60563, 'MAP@10': 0.61597},
 {'Recall@1': 0.51928,
  'Recall@3': 0.66233,
  'Recall@5': 0.73189,
  'Recall@10': 0.79978},
 {'P@1': 0.54, 'P@3': 0.23556, 'P@5': 0.16, 'P@10': 0.08833})

manisnesan commented 1 year ago

https://github.com/vespa-cloud/cord-19-search/blob/main/beir.md

manisnesan / chrestotes

Onboarding IR Benchmarking #23