Query Expansion by Prompting Large Language Models
Team Name
IRFighters
Email
202101222@daiict.ac.in
Team Member 1 Name
Priyesh Tandel
Team Member 1 Id
202101222
Team Member 2 Name
Keertivardhan Goyal
Team Member 2 Id
202103007
Team Member 3 Name
Yash Mashru
Team Member 3 Id
202103045
Team Member 4 Name
Sanchit Satija
Team Member 4 Id
202103054
Category
Reproducibility
Problem Statement
Users often input short queries, because of that sometimes traditional information retrieval methods like BM25 because of it's nature to find exact match, it could not retrieve some relevant documents. which decreases the recall. To improve this we will be using LLM for query expansion using different prompt methods shown in the paper below. Specifically we try to investigate the effectiveness of Flan-T5-Small over Single BM25 alone, whose Implementation of the model is available by PyTerrier.
The paper has shown results for large model of LLM Flan-UL2 (20B parameters) also, but this might not work on our computer, so we will go with Flan-T5-Small (60M parameters).
Evaluation Strategy
For evaluation we will use metric below. (Also mentioned in paper)
Recall
MRR@10
NDCG@10
we are evaluating BM25 alone, BM25 with PRF(Pseudo relevance feedback), and BM25 with LLM and comparing them.
There are different variations in the paper.
Dataset
We will be using these datasets
1) MS MARCO
2) BEIR Dataset.
Title
Query Expansion by Prompting Large Language Models
Team Name
IRFighters
Email
202101222@daiict.ac.in
Team Member 1 Name
Priyesh Tandel
Team Member 1 Id
202101222
Team Member 2 Name
Keertivardhan Goyal
Team Member 2 Id
202103007
Team Member 3 Name
Yash Mashru
Team Member 3 Id
202103045
Team Member 4 Name
Sanchit Satija
Team Member 4 Id
202103054
Category
Reproducibility
Problem Statement
Users often input short queries, because of that sometimes traditional information retrieval methods like BM25 because of it's nature to find exact match, it could not retrieve some relevant documents. which decreases the recall. To improve this we will be using LLM for query expansion using different prompt methods shown in the paper below. Specifically we try to investigate the effectiveness of Flan-T5-Small over Single BM25 alone, whose Implementation of the model is available by PyTerrier.
The paper has shown results for large model of LLM Flan-UL2 (20B parameters) also, but this might not work on our computer, so we will go with Flan-T5-Small (60M parameters).
Evaluation Strategy
For evaluation we will use metric below. (Also mentioned in paper) Recall MRR@10 NDCG@10
we are evaluating BM25 alone, BM25 with PRF(Pseudo relevance feedback), and BM25 with LLM and comparing them. There are different variations in the paper.
Dataset
We will be using these datasets 1) MS MARCO 2) BEIR Dataset.
Resources
Query Expansion by Prompting Large Language Models- https://arxiv.org/pdf/2305.03653