A Dataset for Evaluating Retrieval-Augmented Generation Across Documents
MultiHop-RAG: a QA dataset to evaluate retrieval and reasoning across documents with metadata in the RAG pipelines. It contains 2556 queries, with evidence for each query distributed across 2 to 4 documents. The queries also involve document metadata, reflecting complex scenarios commonly found in real-world RAG applications.
π Paper Link (Accepted by COLM 2024): MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
π€ Hugging Face dataloader
1. For Retrieval
Please try 'simple_retrieval.py,' a sample use case demonstrating retrieval using this dataset.
pip install llama-index==0.9.40
# test simple retrieval and save results
python simple_retrieval.py --retriever BAAI/llm-embedder
# test simple retrieval with rerank and save results
python simple_retrieval.py --retriever BAAI/llm-embedder --rerank
2. For QA
Please try 'qa_llama.py,' a sample use case demonstrating query and answer with llama using this dataset.
python qa_llama.py
1. For Retrieval: 'retrieval_evaluate.py'
2. For QA: 'qa_evaluate.py'
python retrieval_evaluate.py --file {saved_file_path}
For research purposes, we open-sourced part of the code to construct the dataset. However, the current structure of the code is not very tidy. We will organize it in the future.
π‘ Just For Reference: pipeline/
@misc{tang2024multihoprag,
title={MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries},
author={Yixuan Tang and Yi Yang},
year={2024},
eprint={2401.15391},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
MultiHop-RAG is licensed under ODC-BY