parth126 / IT550

Project Proposals for the IT-550 Course (Autumn 2024)
0 stars 0 forks source link

FAQ Retrieval System Using Weighted Edit Distance for Noisy Queries #28

Open dhruvi-m opened 1 month ago

dhruvi-m commented 1 month ago

Title

FAQ Retrieval System Using Weighted Edit Distance for Noisy Queries

Team Name

DPRS

Email

202318003@daiict.ac.in

Team Member 1 Name

Dhruvi Mehta

Team Member 1 Id

202318003

Team Member 2 Name

Prachi Mehta

Team Member 2 Id

202318008

Team Member 3 Name

Riya Dave

Team Member 3 Id

202318011

Team Member 4 Name

satyam Maravaniya

Team Member 4 Id

202318026

Category

Evaluation Track Problem

Problem Statement

n customer support systems, users often submit queries that contain typos, misspellings, or variations in phrasing, making it difficult for automated systems to accurately match these noisy queries to relevant Frequently Asked Questions (FAQs). This leads to a poor user experience and increased manual intervention.

The objective of this project is to develop a FAQ retrieval system capable of handling noisy user queries by utilizing a weighted edit distance approach. The system will compare user-submitted queries against a predefined set of FAQs and return the most relevant match, despite errors or variations in the input. The solution aims to improve query accuracy, enhance system robustness, and reduce the reliance on manual corrections by efficiently matching noisy queries to their corresponding FAQs

Evaluation Strategy

Accuracy of Retrieval (FAQ Match), This evaluates how often the FAQ retrieval system returns the correct FAQ given a noisy query

Dataset

https://www.kaggle.com/datasets/soumikrakshit/yahoo-answers-dataset

Resources

Resource :- Paper title "Weighted Edit Distance based FAQ Retrieval using Noisy Queries," published in the proceedings of the FIRE 2013 Conference https://dl.acm.org/doi/10.1145/2701336.2701644

parth126 commented 1 month ago

Please add some details about the dataset. Although it's hosted on Kaggle, I could not find any details. Did you analyze the data? Is it relevant for noisy retrieval?

dhruvi-m commented 1 month ago

The Yahoo Answers dataset contains 1.4 million training samples and 60,000 testing samples across 10 categories, including question titles, content, and best answers. Given its size and real-world user-generated content, the dataset is highly relevant for noisy retrieval as it contains spelling mistakes, grammatical errors, and variations in phrasing, making it ideal for developing systems that handle noisy queries

parth126 commented 1 month ago

Sounds good. I am approving this