Faster Maximum Inner Product Search in High Dimensions [Reproducibility]

ExhoParth commented 2 months ago

Title

Faster Maximum Inner Product Search in High Dimensions [Reproducibility]

Team Name

VectorForge

Email

202311033@daiict.ac.in

Team Member 1 Name

Parth Borad

Team Member 1 Id

202311033

Team Member 2 Name

Sarthak Vadher

Team Member 2 Id

202311040

Team Member 3 Name

Dhairya Dave

Team Member 3 Id

202311011

Team Member 4 Name

Viraj Prajapati

Team Member 4 Id

202311069

Problem Statement

In the era of large-scale recommendation systems, document retrieval, and question-answering, efficient and accurate search techniques are essential. The Maximum Inner Product Search (MIPS) problem, a key operation in vector database systems, is used to find the item that maximizes the inner product with a given query. This operation is computationally expensive in high-dimensional vector spaces, which necessitates more efficient methods.

In 2024, a group of researchers presented a novel approach for 'Faster Maximum Inner Product Search in High Dimensions' at the International Conference on Machine Learning (ICML). Their work introduces a new framework to reduce computational cost and improve search speed. This project aims to reproduce the methodology outlined in the paper and implement our own version to gain deeper insights, which will ultimately help in designing efficient document retrieval systems from vector database.

Evaluation Strategy

The evaluation focuses on the trade-off between accuracy and speed, comparing the proposed algorithm against eight baseline MIPS algorithms: LSH-MIPS, H2-ALSH-MIPS, NEQ-MIPS, PCA-MIPS, BoundedME, Greedy-MIPS, HNSW-MIPS, and NAPG-MIPS. Additionally, the sample complexity of the algorithms is analyzed.

Dataset

Synthetic Dataset
Real-world Datasets
Netflix Price Data
MovieLens 20M Dataset
Additional Datasets
Sift-1M
CryptoPairs

Resources

Paper: Faster Maximum Inner Product Search in High Dimensions ICML 2024 MIPS Wikipedia MIPS Introduction Blog Demonstrating the Application of MIPS Y Combinator Hacker News Forum Understanding the approximate nearest neighbor (ANN) algorithm Faiss: A library for efficient similarity search

parth126 commented 2 months ago

Problem too broad and ambitious for a class project
The problem could be narrowed down to optimizing a Vector retrieval setup by some approximate matching algorithms

parth126 commented 2 months ago

The proposal is good enough for acceptance.

parth126 / IT550