manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9k stars 503 forks source link

Vector quantization for KNN search #1809

Open glookka opened 8 months ago

glookka commented 8 months ago

Currently, Manticore uses the HNSW index over floats for its KNN search implementation. That might lead to excessive memory consumption, as all HNSW indexes must be loaded into RAM. One way to improve this is to quantize float vectors into word/byte vectors and use an HNSW index over such vectors. This will only affect memory consumption (and KNN search accuracy), as the original float vectors will still be stored.

sanikolaev commented 8 months ago

In Slack we discussed how much RAM it takes now (w/o any quantization):

I've made test with 550000 records.
Here're results for .spknn files size (and same RAM usage) with different parameters.
vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='50' hnsw_ef_construction='600'
968МB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='600'
725MB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='600' fast_fetch='0'
683MB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='64' fast_fetch='0'
355MB

vector float_vector knn_type='hnsw' knn_dims='384' hnsw_similarity='COSINE' hnsw_m='10' hnsw_ef_construction='32' fast_fetch='0'
355MB
sanikolaev commented 3 months ago

Interesting post from Cohere https://www.linkedin.com/posts/reimersnils_%3F%3F%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F%3F-%3F%3F-%3F%3F%3F-ugcPost-7214251130984296448-fDs2/?utm_source=share&utm_medium=member_desktop about using Faiss' IVFPQ

glookka commented 3 months ago

Some info on quantization methods https://neuml.hashnode.dev/all-about-vector-quantization