parth126 / IT550

Project Proposals for the IT-550 Course (Autumn 2024)
0 stars 0 forks source link

Latent Semantic Indexing (LSI) Implementation for Efficient Document Retrieval #17

Open 202318021 opened 2 months ago

202318021 commented 2 months ago

Title

Latent Semantic Indexing (LSI) Implementation for Efficient Document Retrieval### Team Name

Learners

Email

202318021@daiict.ac.in

Team Member 1 Name

Bhavsar Vishva

Team Member 1 Id

202318019

Team Member 2 Name

Malkan Kandarp

Team Member 2 Id

202318021

Team Member 3 Name

Srinibas Masanta

Team Member 3 Id

202318054

Team Member 4 Name

Ghotra Jaspreet Kaur

Team Member 4 Id

202318058

Category

Optimizing an existing system

Problem Statement

This project aims to develop a Latent Semantic Indexing (LSI) model for enhanced document retrieval, focusing on uncovering latent relationships between words and documents beyond simple keyword matching. The approach includes preprocessing text data, calculating Term Frequency-Inverse Document Frequency (TF-IDF) values, and applying Singular Value Decomposition (SVD) to reduce the dimensionality of the document-term matrix. By reconstructing the matrix with selected singular values, we will create a compressed version that retains essential semantic information. The goal is to build an efficient search system that uses this reduced matrix to retrieve the most relevant documents.

Evaluation Strategy

To evaluate the effectiveness of our Latent Semantic Indexing (LSI) model, we will use the Frobenius norm to measure the difference between the original document-term matrix and the reconstructed matrix after Singular Value Decomposition (SVD).

Dataset

None

Resources

None

parth126 commented 2 months ago
parth126 commented 2 months ago

Suggested steps:

  1. Some research about existing approximation algorithms, and challenges/strengths for those
  2. Make it work for a decent sized matrix (need not be related to text)
  3. Use the implementation from (2) for applications like clustering, search, etc