Query Expansion and Ranking Evaluation

BhavikDS commented 2 months ago

Title

Team Name

Retrievers

Email

202318002@daiict.ac.in

Team Member 1 Name

Bhavik Manwani

Team Member 1 Id

202318002

Team Member 2 Name

Manthan Solanki

Team Member 2 Id

202318002

Team Member 3 Name

Anmol Poonia

Team Member 3 Id

202318009

Team Member 4 Name

Kanishk Pareek

Team Member 4 Id

202101134

Problem Statement

The problem is to improve the retrieval performance of an information retrieval system by implementing and evaluating query expansion techniques. The primary challenge is to enhance query coverage without introducing noise, and to evaluate the effectiveness of these expansions using a graded relevance system. The goal is to determine whether query expansion improves the ranking of relevant documents in terms of user satisfaction.

Evaluation Strategy

This project will evaluate the impact of query expansion techniques by comparing the performance of the baseline system (no expansion) with expanded queries. We will implement synonym expansion and contextual expansion to enhance query coverage and use ranking algorithms like BM25 and TF-IDF. The evaluation will focus on metrics like Normalized Discounted Cumulative Gain (NDCG), which is well-suited for graded relevance judgments (2, 1, 0, -1), and Mean Average Precision (MAP) to measure the overall retrieval performance. Traditional metrics like precision, recall, and F1 score will also be used.

Dataset

https://huggingface.co/datasets/nreimers/trec-covid

Resources

Title: Learning to rank query expansion terms for COVID-19 scholarly search Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174726/

parth126 commented 2 months ago

Questions:

Do you plan to implement these query expansion techniques on your own? Or using something like terrier or elastic? The latter is not allowed.
What query expansion technique are you going to use?

Try to answer these questions by tomorrow.

parth126 commented 2 months ago

Suggested changes:

Find a dataset where Qrels are available. One dataset is fire dataset, but find one more which is bigger in size

parth126 / IT550