Vector quantization for KNN search

glookka commented 8 months ago

Currently, Manticore uses the HNSW index over floats for its KNN search implementation. That might lead to excessive memory consumption, as all HNSW indexes must be loaded into RAM. One way to improve this is to quantize float vectors into word/byte vectors and use an HNSW index over such vectors. This will only affect memory consumption (and KNN search accuracy), as the original float vectors will still be stored.

sanikolaev commented 8 months ago

In Slack we discussed how much RAM it takes now (w/o any quantization):

I've made test with 550000 records.
Here're results for .spknn files size (and same RAM usage) with different parameters.
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='50' hnsw_ef_construction='600'
968МB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='600'
725MB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='600' fast_fetch='0'
683MB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='64' fast_fetch='0'
355MB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='32' fast_fetch='0'
355MB

sanikolaev commented 3 months ago

Interesting post from Cohere https://www.linkedin.com/posts/reimersnils_%3F%3F%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F%3F-%3F%3F-%3F%3F%3F-ugcPost-7214251130984296448-fDs2/?utm_source=share&utm_medium=member_desktop about using Faiss' IVFPQ

glookka commented 3 months ago

Some info on quantization methods https://neuml.hashnode.dev/all-about-vector-quantization

manticoresoftware / manticoresearch

Vector quantization for KNN search #1809