Faster Maximum Inner Product Search in High Dimensions [Reproducibility]
Team Name
VectorForge
Email
202311033@daiict.ac.in
Team Member 1 Name
Parth Borad
Team Member 1 Id
202311033
Team Member 2 Name
Sarthak Vadher
Team Member 2 Id
202311040
Team Member 3 Name
Dhairya Dave
Team Member 3 Id
202311011
Team Member 4 Name
Viraj Prajapati
Team Member 4 Id
202311069
Category
Reproducibility
Problem Statement
In the era of large-scale recommendation systems, document retrieval, and question-answering, efficient and accurate search techniques are essential. The Maximum Inner Product Search (MIPS) problem, a key operation in vector database systems, is used to find the item that maximizes the inner product with a given query. This operation is computationally expensive in high-dimensional vector spaces, which necessitates more efficient methods.
In 2024, a group of researchers presented a novel approach for 'Faster Maximum Inner Product Search in High Dimensions' at the International Conference on Machine Learning (ICML). Their work introduces a new framework to reduce computational cost and improve search speed. This project aims to reproduce the methodology outlined in the paper and implement our own version to gain deeper insights, which will ultimately help in designing efficient document retrieval systems from vector database.
Evaluation Strategy
The evaluation focuses on the trade-off between accuracy and speed, comparing the proposed algorithm against eight baseline MIPS algorithms: LSH-MIPS, H2-ALSH-MIPS, NEQ-MIPS, PCA-MIPS, BoundedME, Greedy-MIPS, HNSW-MIPS, and NAPG-MIPS. Additionally, the sample complexity of the algorithms is analyzed.
Title
Faster Maximum Inner Product Search in High Dimensions [Reproducibility]
Team Name
VectorForge
Email
202311033@daiict.ac.in
Team Member 1 Name
Parth Borad
Team Member 1 Id
202311033
Team Member 2 Name
Sarthak Vadher
Team Member 2 Id
202311040
Team Member 3 Name
Dhairya Dave
Team Member 3 Id
202311011
Team Member 4 Name
Viraj Prajapati
Team Member 4 Id
202311069
Category
Reproducibility
Problem Statement
In the era of large-scale recommendation systems, document retrieval, and question-answering, efficient and accurate search techniques are essential. The Maximum Inner Product Search (MIPS) problem, a key operation in vector database systems, is used to find the item that maximizes the inner product with a given query. This operation is computationally expensive in high-dimensional vector spaces, which necessitates more efficient methods.
In 2024, a group of researchers presented a novel approach for 'Faster Maximum Inner Product Search in High Dimensions' at the International Conference on Machine Learning (ICML). Their work introduces a new framework to reduce computational cost and improve search speed. This project aims to reproduce the methodology outlined in the paper and implement our own version to gain deeper insights, which will ultimately help in designing efficient document retrieval systems from vector database.
Evaluation Strategy
The evaluation focuses on the trade-off between accuracy and speed, comparing the proposed algorithm against eight baseline MIPS algorithms: LSH-MIPS, H2-ALSH-MIPS, NEQ-MIPS, PCA-MIPS, BoundedME, Greedy-MIPS, HNSW-MIPS, and NAPG-MIPS. Additionally, the sample complexity of the algorithms is analyzed.
Dataset
Real-world Datasets
Additional Datasets
Resources
Paper: Faster Maximum Inner Product Search in High Dimensions ICML 2024 MIPS Wikipedia MIPS Introduction Blog Demonstrating the Application of MIPS Y Combinator Hacker News Forum Understanding the approximate nearest neighbor (ANN) algorithm Faiss: A library for efficient similarity search