Latent Semantic Indexing (LSI) Implementation for Efficient Document Retrieval### Team Name
Learners
Email
202318021@daiict.ac.in
Team Member 1 Name
Bhavsar Vishva
Team Member 1 Id
202318019
Team Member 2 Name
Malkan Kandarp
Team Member 2 Id
202318021
Team Member 3 Name
Srinibas Masanta
Team Member 3 Id
202318054
Team Member 4 Name
Ghotra Jaspreet Kaur
Team Member 4 Id
202318058
Category
Optimizing an existing system
Problem Statement
This project aims to develop a Latent Semantic Indexing (LSI) model for enhanced document retrieval, focusing on uncovering latent relationships between words and documents beyond simple keyword matching. The approach includes preprocessing text data, calculating Term Frequency-Inverse Document Frequency (TF-IDF) values, and applying Singular Value Decomposition (SVD) to reduce the dimensionality of the document-term matrix. By reconstructing the matrix with selected singular values, we will create a compressed version that retains essential semantic information. The goal is to build an efficient search system that uses this reduced matrix to retrieve the most relevant documents.
Evaluation Strategy
To evaluate the effectiveness of our Latent Semantic Indexing (LSI) model, we will use the Frobenius norm to measure the difference between the original document-term matrix and the reconstructed matrix after Singular Value Decomposition (SVD).
Possible alternates
a. Pick a simple mathematical concept and explore its applications at scale (e.g. SVD on a large dataset)
This will result in understanding of concepts + how to code those concepts and make approximations in practice
b. Pick a simple problem and explore coding from scratch (again for large datasets)
This will result in understanding of coding neural nets / Language models
Title
Latent Semantic Indexing (LSI) Implementation for Efficient Document Retrieval### Team Name
Learners
Email
202318021@daiict.ac.in
Team Member 1 Name
Bhavsar Vishva
Team Member 1 Id
202318019
Team Member 2 Name
Malkan Kandarp
Team Member 2 Id
202318021
Team Member 3 Name
Srinibas Masanta
Team Member 3 Id
202318054
Team Member 4 Name
Ghotra Jaspreet Kaur
Team Member 4 Id
202318058
Category
Optimizing an existing system
Problem Statement
This project aims to develop a Latent Semantic Indexing (LSI) model for enhanced document retrieval, focusing on uncovering latent relationships between words and documents beyond simple keyword matching. The approach includes preprocessing text data, calculating Term Frequency-Inverse Document Frequency (TF-IDF) values, and applying Singular Value Decomposition (SVD) to reduce the dimensionality of the document-term matrix. By reconstructing the matrix with selected singular values, we will create a compressed version that retains essential semantic information. The goal is to build an efficient search system that uses this reduced matrix to retrieve the most relevant documents.
Evaluation Strategy
To evaluate the effectiveness of our Latent Semantic Indexing (LSI) model, we will use the Frobenius norm to measure the difference between the original document-term matrix and the reconstructed matrix after Singular Value Decomposition (SVD).
Dataset
None
Resources
None