Open KavishaMadani opened 1 month ago
@KavishaMadani There are two proposals from your team. Please close the one that is not relevant
Sir, I have closed the proposal titled 'Fashion Product Retrieval Using Semantic Search and Natural Language Generation'.
On Wed, Sep 25, 2024 at 11:07 AM Parth Mehta @.***> wrote:
@KavishaMadani https://github.com/KavishaMadani There are two proposals from your team. Please close the one that is not relevant
— Reply to this email directly, view it on GitHub https://github.com/parth126/IT550/issues/26#issuecomment-2373065292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZP5OPFR577HR4DZBZWJPE3ZYJDYLAVCNFSM6AAAAABOQQL32WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZTGA3DKMRZGI . You are receiving this because you were mentioned.Message ID: @.***>
Ok. Marking as approved.
Title
Enhancing Information Retrieval Using Topic Modeling
Team Name
InfoSphere
Email
202318007@daiict.ac.in
Team Member 1 Name
Kavisha Madani
Team Member 1 Id
202318007
Team Member 2 Name
Vishaka Nair
Team Member 2 Id
202318041
Team Member 3 Name
Srushti Bhagchandani
Team Member 3 Id
202318047
Team Member 4 Name
Shubham Gupta
Team Member 4 Id
202318052
Category
Optimizing an existing system
Problem Statement
This project aims to enhance information retrieval systems by integrating Topic Modeling, specifically Latent Dirichlet Allocation (LDA) and possibly by Fast Deterministic CUR based approach with Retrieval-Augmented Generation (RAG). By training an LDA model on a diverse document corpus, we will categorize documents into distinct topics and associate each document with a topic distribution. Incoming queries will be processed to infer their topic distribution, allowing for the augmentation of queries with relevant keywords. The retrieval mechanism will prioritize documents that align thematically with the query, combining traditional similarity metrics with topic similarity. This approach will ensure that the responses generated are not only contextually relevant but also deeply aligned with the underlying topics of interest.
Evaluation Strategy
Precision, Recall, and F1-score: These will measure the relevance of retrieved documents. Query Response Time: This will assess the efficiency of the proposed model, with an aim to make the retrieval process at least 10% faster than the baseline. Topic Relevance Score: This will evaluate how well the documents retrieved match the query’s thematic topics using topic coherence scores. Human Feedback Evaluation: Using a small sample, human evaluators will assess the accuracy and relevance of responses generated by the RAG system.
Dataset
https://www.kaggle.com/datasets/thedevastator/uncovering-financial-insights-with-the-reuters-2?select=ModApte_train.csv, https://www.kaggle.com/code/pranjalsoni17/topic-modelling-using-lda,
Resources
Paper Title - Latent Dirichlet Allocation, by David M. Blei, Andrew Y. Ng, and Michael I. Jordan Paper Link - https://www.researchgate.net/publication/221620547_Latent_Dirichlet_Allocation Paper Title - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Lewis et Paper Link - https://arxiv.org/abs/2005.11401 Paper Title - Fast Deterministic CUR Matrix Decomposition with Accuracy Assurance. by Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara, Tomoharu Iwata, Koh Takeuchi, Hisashi Kashima Paper Link - https://proceedings.mlr.press/v119/ida20a.html